Keep reading if you want to know how to create a hash table and why they are efficient data types.
First things first. Why are hash tables important?
Hash tables are essentially the same thing as declaring another variable. Sure variables are the best solution but what if you have an unknown amount of variables? Maybe your first thought is to store the values in an array, but that is not the best choice.
Why are arrays bad for multiple values?
Arrays are fine for small chunks of data. However, if you need to constantly filter/search through the array, then your program will be slow. Searching through an array is essentially trial and error until the correct item is found. Very slow. In big O notation, this is O(n)
To understand the performance difference of hash tables, you need to know Big O notation.
A hash table works by storing values in an array with a fixed size. Instead of appending the array, the index is calculated using a hash function. Essentially, this means the value will be stored randomly. When it's time the get the value, the index is recomputed and retrieved from the array.
What is the complexity of a hash table?
Hash tables have a complexity of O(1) except when there is a collision. Then the complexity is O(n). This is why collisions should be avoided or reduced.
Just show me the code:
The first step is to create a hashing algorithm.
Make sure your hashing algorithm does not add complexity. (I'm increasing complexity for the sake of making the code easier to understand). A salt is used to increase entropy. It is best to pick prime numbers as a salt because 0 and multiples of two are more likely to cause collisions. The hash function can be anything that produces a pseudorandom number. I chose to multiply the decimal values of each character in the string. For more efficient algorithms refer to this StackOverflow post.
def create_hash(plaintext, salt=43):
total = salt
for char in str(plaintext):
total = total * ord(char)
Next, we need a compression algorithm.
Since the hash has a fixed size, we need a function that takes the big integer from the hash and produces another pseudorandom number that is smaller than the length of the array. Otherwise, we'll cause an overflow error. We'll use a common compression algorithm found in other languages. This is just math for now. Remember do not to add complexity!
a = random int
b = random int
h = hash from key
m = size of hash table array
c = (a * h + b) % m
Creating the hash table class
Since the compression requires some random values, its best to calculate those values as little as possible. So, we'll calculate it when a hash is made. Depending on the scenario it may be better to define these values as constants.
Collisions will happen. To handle them we will store all duplicates and we store the key and value in an array. Remember, using these arrays barely affects the complexity because we are reducing collisions. Most likely there will only be a few items in the array in the worst-case scenario O(n). If there is only one item in the array then the complexity is still O(1).