Beginners Guide to Hash Tables and Complexity

New Reply

Programming Beginners Guide to Hash Tables and Complexity

Posted: Fri Dec 18, 2020 11:38 pm

Programming Beginners Guide to Hash Tables and ComplexityPosted: Fri Dec 18, 2020 11:38 pm

CriticaI

2 Million

Status: Offline

Joined: Nov 05, 201310Year Member

Posts: 2,737

Reputation Power: 448

Status: Offline

Joined: Nov 05, 201310Year Member

Posts: 2,737

Reputation Power: 448

Keep reading if you want to know how to create a hash table and why they are efficient data types.

First things first. Why are hash tables important?

Hash tables are essentially the same thing as declaring another variable. Sure variables are the best solution but what if you have an unknown amount of variables? Maybe your first thought is to store the values in an array, but that is not the best choice.

Why are arrays bad for multiple values?

Arrays are fine for small chunks of data. However, if you need to constantly filter/search through the array, then your program will be slow. Searching through an array is essentially trial and error until the correct item is found. Very slow. In big O notation, this is O(n)

To understand the performance difference of hash tables, you need to know Big O notation.

Loops, nested loops, and recursion all increase complexity. The goal is to be as close to O(1) as possible.
https://nedbatchelder.com/text/bigo_pix/011.png

How does a hash table work?

A hash table works by storing values in an array with a fixed size. Instead of appending the array, the index is calculated using a hash function. Essentially, this means the value will be stored randomly. When it's time the get the value, the index is recomputed and retrieved from the array.

What is the complexity of a hash table?

Hash tables have a complexity of O(1) except when there is a collision. Then the complexity is O(n). This is why collisions should be avoided or reduced.

Just show me the code:

The first step is to create a hashing algorithm.

Make sure your hashing algorithm does not add complexity. (I'm increasing complexity for the sake of making the code easier to understand). A salt is used to increase entropy. It is best to pick prime numbers as a salt because 0 and multiples of two are more likely to cause collisions. The hash function can be anything that produces a pseudorandom number. I chose to multiply the decimal values of each character in the string. For more efficient algorithms refer to this StackOverflow post.




def create_hash(plaintext, salt=43):


    total = salt


    for char in str(plaintext):


        total = total * ord(char)


    return total

Next, we need a compression algorithm.

Since the hash has a fixed size, we need a function that takes the big integer from the hash and produces another pseudorandom number that is smaller than the length of the array. Otherwise, we'll cause an overflow error. We'll use a common compression algorithm found in other languages. This is just math for now. Remember do not to add complexity!




a = random int


b = random int


h = hash from key


m = size of hash table array





c = (a * h + b) % m

Creating the hash table class

Since the compression requires some random values, its best to calculate those values as little as possible. So, we'll calculate it when a hash is made. Depending on the scenario it may be better to define these values as constants.




class hash_table():





    def __init__(self, size, salt=43):


        self.arr = [None] * math.ceil(size * 1.3)


        self.prime = 2147483647


        self.salt = salt


        self.a = random.randint(self.salt, self.prime)


        self.b = random.randint(self.salt, self.prime)





    def compress(self, key):


        return (self.a * create_hash(key, self.salt) + self.b) % len(self.arr)

Storing values and handling collisions

Collisions will happen. To handle them we will store all duplicates and we store the key and value in an array. Remember, using these arrays barely affects the complexity because we are reducing collisions. Most likely there will only be a few items in the array in the worst-case scenario O(n). If there is only one item in the array then the complexity is still O(1).




def set_item(self, key, value):


        lookup = self.compress(key)


        if self.arr[lookup]: 


            self.arr[lookup].append([key, value])


        else:


            self.arr[lookup] = [[key, value]]

Getting a value:

Now use the lookup to get all the collisions from a hash. When we loop over the collisions we return the value if the key matches. Break to reduce complexity.




def get_item(self, key):


        lookup = self.compress(key)


        if self.arr[lookup]:


            for collision in self.arr[lookup]:


                if collision[0] == key:


                    return collision[1]


        return None

Removing a key:

Removing a key is like getting a key. We search through the possible collisions and delete the key/value pair from the collision array if the key matches.




def remove_item(self, key):


        lookup = self.compress(key)


        if self.arr[lookup]:


            for index in range(len(self.arr[lookup])):


                if self.arr[lookup][index][0] == key:


                    del self.arr[lookup][index]


                    break

All together now.

Here is all the code and how to use it.




import math


import random





def create_hash(plaintext, salt=43):


    total = salt


    for char in str(plaintext):


        total = total * ord(char)


    return total





class hash_table():





    def __init__(self, size, salt=43):


        self.arr = [None] * math.ceil(size * 1.3)


        self.prime = 2147483647


        self.salt = salt


        self.a = random.randint(self.salt, self.prime)


        self.b = random.randint(self.salt, self.prime)





    def compress(self, key):


        return (self.a * create_hash(key, self.salt) + self.b) % len(self.arr)





    def set_item(self, key, value):


        lookup = self.compress(key)


        if self.arr[lookup]:


            self.arr[lookup].append([key, value])


        else:


            self.arr[lookup] = [[key, value]]


        


    def get_item(self, key):


        lookup = self.compress(key)


        if self.arr[lookup]:


            for collision in self.arr[lookup]:


                if collision[0] == key:


                    return collision[1]


        return None





    def remove_item(self, key):


        lookup = self.compress(key)


        if self.arr[lookup]:


            for index in range(len(self.arr[lookup])):


                if self.arr[lookup][index][0] == key:


                    del self.arr[lookup][index]


                    break








my_hash = hash_table(200) # -> arr[200]


my_hash.set_item('name', 'my name') 


my_hash.get_item('name')  # -> "criticai"


my_hash.remove_item('name')


my_hash.set_item('site', 'TTG')


my_hash.set_item('name', 'critical')


my_hash.get_item('name') # -> critical


my_hash.get_item('site') # -> TTG

Additional notes

Remember hash tables can have a complexity of O(1) but after more data is added it has a higher chance of colliding which creates a complexity of O(n).

Since the hash is compressed based on the size of the internal array, then to reduce collisions you should increase the internal array size.

Ideally, you should use a size that is 1.3 times the number of keys you plan to have.

The following 1 user thanked CriticaI for this useful post:

FOV (12-19-2020)

#2. Posted: Sat Dec 19, 2020 1:17 am

FOV

Mind Charity

Status: Offline

Joined: Aug 31, 201112Year Member

Posts: 1,883

Reputation Power: 8098

Motto: The extent of the observable world that is seen at any given moment.

Status: Offline

Joined: Aug 31, 201112Year Member

Posts: 1,883

Reputation Power: 8098

Motto: The extent of the observable world that is seen at any given moment.

This is very informative. Thanks for the guide!

0useful
0not useful

New Reply

Users browsing this topic: None

Programming Beginners Guide to Hash Tables and Complexity

Programming Beginners Guide to Hash Tables and ComplexityPosted: Fri Dec 18, 2020 11:38 pm

The following 1 user thanked CriticaI for this useful post:

RECENT POSTS

HOT TOPICS

Posts

30,609,300

Members

2,862,901

New Today

26

Topics

5,882,203

ProgrammingBeginners Guide to Hash Tables and Complexity

ProgrammingBeginners Guide to Hash Tables and ComplexityPosted: Fri Dec 18, 2020 11:38 pm

The following 1 user thanked CriticaI for this useful post:

RECENT POSTS

HOT TOPICS

Posts

30,609,300

Members

2,862,901

New Today

26

Topics

5,882,203

Programming Beginners Guide to Hash Tables and Complexity

Programming Beginners Guide to Hash Tables and ComplexityPosted: Fri Dec 18, 2020 11:38 pm