Hash Tables

Do you think you could create a better hash function?

I do think I could create a better and simpler hash function. However, I would need to think much with different mathematical equations and other things that would change one input to another form. Just like what we did with Horner’s Rule hash function. I search it in the net, and it’s an algorithm used to simplify the process of evaluating a polynomial in a certain way, likewise with UNIX Elf Hash function.

Which hash function would you choose?

When I was searching in the net for more references about hash tables, I came across a function that dealt with the modulo operation. I think I would use that because it’s simpler. However, I would still need to dig in deeper to know if more complex is better or the simpler the better.

Perhaps experiment with using a different hash function than presented here.

I’ll experiment using the modulo hash function. What this does is that the hash function takes in the sum of the ASCII values of each character in a string and modulo it with 100 (or any different number).

I used the same “hash” function with my new hashFunctionModulo. However, I could also just make a simple hash function with just using hashFunctionModulo and without hash above. However, I wouldn’t make it a in the style of multiplicative hashing.

When just using the hashFunctionModulo, we can get values from 1-99 and put that into hash and this will generate an “address” for the key-value pair.

Code Used:

class ChainedHashTableHorner:
    def __init__(self):
        self.z = 2538972731
        self.w = 32
        self._dimension = 3 # value the checks the size and scales accordingly with resize
        self._table = [None] * (2 ** self._dimension)
        self._elements = 0

    def size(self):
        return len(self._table)

    def hashCode_Horner(self, string):
        h = 0
        for character in string:
            h = (31 * h) + ord(character)
        return h

    def hash(self, hashCode):
        # hash(x) = ((z ⋅ x) mod 2^w) / 2^(w-d)
        return int(((self.z * hashCode) % (2 ** self.w)) / (2 ** (self.w - self._dimension)))

    def resize(self):
        self._elements = 0 
        self._dimension += 1
        # Creates table for the original
        orig_table = self._table
        # Makes the new set of table
        self._table = [None] * (2 ** self._dimension)

        # Transfers the elements from the original table to the new
        for key in orig_table:
            if key is not None:
                for value in key:
                    self.add(value)

    def add(self, key):
        # If numbers of elements added is greater than the size of the table, then resize 
        if self._elements > self.size():
            self.resize()

        # getting the "index" of the "value" from the "key"
        index = self.hash(self.hashCode_Horner(key))

        # Checks if there is nothing in that index
        if self._table[index] is None:
            self._table[index] = [key]
            self._elements += 1
        
        # If there's already a value in the index, create the "linked list"
        else:
            if not key in self._table[index]:
                self._table[index].append(key)
                self._elements += 1
        
        return True


class ChainedHashTableUNIXElf:
    def __init__(self):
        self.z = 2538972731
        self.w = 32
        self._dimension = 3
        self._table = [None] * (2 ** self._dimension)
        self._elements = 0

    def size(self):
         return len(self._table)

    def hashCode_UNIX_Elf(self, word):
        h = 0
        for char in word:
            h = (h << 4) + ord(char)
            high = (h & 0xF0000000)
            if high != 0:
                h = h ^ (high >> 24)
            h = h & ~high
            h = h % (2**32)
        return h

    def hash(self, hashCode):
        # hash(x) = ((z ⋅ x) mod 2^w) / 2^(w-d)
        return int(((self.z * hashCode) % (2 ** self.w)) / (2 ** (self.w - self._dimension)))

    def resize(self):
        self._elements = 0 
        self._dimension += 1
        # Creates table for the original
        orig_table = self._table
        # Makes the new set of table
        self._table = [None] * (2 ** self._dimension)

        # Transfers the elements from the original table to the new
        for key in orig_table:
            if key is not None:
                for value in key:
                    self.add(value)

    def add(self, key):
        # If numbers of elements added is greater than the size of the table, then resize 
        if self._elements > self.size():
            self.resize()

        # getting the "index" of the "value" from the "key"
        index = self.hash(self.hashCode_UNIX_Elf(key))

        # Checks if there is nothing in that index
        if self._table[index] is None:
            self._table[index] = [key]
            self._elements += 1
        
        # If there's already a value in the index, create the "linked list"
        else:
            if not key in self._table[index]:
                self._table[index].append(key)
                self._elements += 1
        
        return True

input_dat = []

# opening file
with open('Week 6\input.dat', 'r') as input:
    input.readline()
    for line in input:
        input_dat.append(line.split())

# for line in input_dat:
#     print(line)

for line in input_dat:
    
    h = ChainedHashTableHorner()
    u = ChainedHashTableUNIXElf()

    for character in range(1, len(line)-1):
        h.add(line[character])
        u.add(line[character])
    
    # appends in the list the length of the values in each bucket
    he = []
    for i in h._table:
        if i is None:
            he.append(0)
        else:
            he.append(len(i))

    # checks where x is and appends the length in that bucket  
    for j in h._table:
        if j is not None:
            for k in j:
                if k == line[-1]:
                    he.append(len(j))

    # appends in the list the length of the values in each bucket
    ue = []
    for i in u._table:
        if i is None:
            ue.append(0)
        else:
            ue.append(len(i))
    
    # checks where x is and appends the length in that bucket  
    for j in u._table:
        if j is not None:
            for k in j:
                if k == line[-1]:
                    ue.append(len(j))
    
    # print(he)
    # print(ue)
    # print()

    for i in he:
        print(i, end = " ")

    print()
    
    for i in ue:
        print(i, end = " ")
    
    print()
        

# print(h._table)
# print(he)
# print(u._table)
# print(ue)

# print(u._table)
# # # print(input_dat)
# print(he)
# print(ue)

# sample = ChainedHashTableHorner()
# sample.add("over")
# print(sample._table)

Input.dat

3
9 the quick brown fox jumps over the lazy dog over
14 I would rather make a new one line long haiku than this sentence now haiku
18 somebody once told me the world was gonna roll me I ain't the sharpest tool in the shed world

Output

Leave a comment

Design a site like this with WordPress.com
Get started