Hashtables and Hashmaps and Hashsets

In some languages, there’s a distinction between “Hashtable” and “HashMap” based on thread safety (i.e. in Java a HashMap is unsynchronized meaning multiple threads can modify the value a Hashtable is not) and null key/value support (HashMap allows a null key and null values, a Hashtable doesn’t). However it doesn’t seem that there’s a difference universally across all languages. (e.g. Python has neither but dictionaries have similar properties to HashMaps)

While HashMaps and Hashtables store key-value pairs, a Hashset is just a collection of elements.

What is a Hashtable/HashMap?

A Hashtable/HashMap is a collection of key-value pairs which can be accessed in O(1) time (worst case scenario O(n) more on that later). They are easy to confuse with an Array since both store multiple values, however there is one key difference, in an Array the values are “simply” stored next to each other in memory which we then can access using their index. The biggest issue with bigger arrays is when we need to find a value in an array.

For example if we want to check the capital city of France. We have an array of capitalCities which stores capital cities and their countries like this: "France = Paris". We would need to iterate through all the elements in the array and check each of them whether they include France or not. In the best case scenario we find the needed element relatively quick if it’s at the beginning of the array. In the worst case the string is not in the array at all and we wasted all this time going through the whole array for nothing.

This is where Hashtables/HashMaps perform much better. The problem is: we have no way of knowing whether or not an element exists in the array without checking the whole array until we find said element. In a Hashtable, we have key and value pairs like this: key: value. A key can be used as an identifier for a value. Before we store a key-value pair we first Hash the key and based off of the result determine where to store the key-value pair. A hash is a function that takes an input and creates a seemingly random output, however this output will always be the same assuming the input is also the same.

So for example, let’s say our key-value pair is: "France": "Paris". Let’s also say the maximum entries the Hashtable can have is 100. After hashing the key "France" the result could be something like this: 73, using this value we can now select a location in memory (called a bucket) and store the pair there. It’s different from language to language but let’s assume we pick the value by performing a Modulo operation using the maximum entries in our table. In this case we would do: 3965273 % 100 which is 73 (The random number is "France" - I wont go into detail how strings are converted to numbers to create a hashkey). To now store the value, we first reserve a 100 buckets since this is the maximum entries we can have and then store our "France": "Paris" pair in bucket 73.

After some time the table would be filled up with countries and capitals around the world. If we now want to find the capital of France instead of checking all the elements we can simply again calculate the hash for "France" (since our Hashtable uses the country as the key) and get the same result as before (e.g. 73). Based on this result we would then again calculate the place in memory where our value is stored and come again to the bucket number 73. Then we can simply retrieve the value "Paris" and that’s it. This way we got the value we were looking for in the quickest way possible!

Normally the difference between HashMaps and Hashtables is defined as follows:

HashMap:
- A HashMap is a data structure that stores key-value pairs and allows for fast retrieval based on the key.
- It allows one null key and multiple null values and maintains no order of its elements.
- It is generally faster than a Hashtable because it is unsynchronized, meaning that it is not thread-safe by default and doesn’t incur synchronization overhead unless explicitly synchronized.
Hashtable:
- Similar to HashMap, Hashtable also stores key-value pairs. However, it is synchronized, meaning it is thread-safe and can be shared between multiple threads without causing data corruption.
- Hashtable does not allow any null key or null value, which is a significant difference from HashMap.
- Due to its synchronization, it is slower compared to HashMap, especially in scenarios where concurrent access isn’t needed.

The above list is specifically applicable to Java since the terms HashMap, and Hashtable are more commonly associated with Java but other languages obviously have similar data structures.

HashSets

A hashset is basically a HashMap but instead of using a key-value pair it just uses singular elements. It’s also not synchoronized and allows one null element. This is suitable if you don’t need to store key-value pairs but just want to have a larger list of elements simillar to an array that you want to check something against. (i.e. is a user registered?)

Time Consumption & Avoiding collisions

One issue that Hashes have is causing collisions. This means that 2 different inputs can generate the same hash output. So for example the keys "France" and "Germany" could result in the same hash output of 73 and based on that output both should be stored in bucket 73 which obviously doesn’t work. There are many ways to avoid this issue but one example is creating a Linked List in the bucket when this happens. So if 2 pairs end up being stored in the same bucket we would create one element containing "France": "Paris" and which links to the next element "Germany": "Berlin".

Later when someone looks up the key "Germany", we again calculate the hash, find the bucket 73 and then simply iterate through the linked list until we find the key we need. This obviously takes longer than O(1) so that’s why we say a HashMap takes in most cases O(1) time and in the worst cases O(n).

Worst case is actually O(log n)!

modern implementations of hash tables, such as Java’s HashMap, use balanced trees instead of linked lists for buckets with many hash collisions, which improves worst-case performance from O(n) to O(log n).

Python

In Python there is no direct “Hashtable” or “HashMap” class, the Data Structure most similar to that is called a Dictionary which is closest to a HashMap since they allow null values and one null key.

Python Dictionaries have some features that traditional HashMaps don't

dictionaries in Python as of version 3.7 and beyond are guaranteed to maintain insertion order, which is a feature not traditionally associated with either Hashtables or HashMaps in other languages. Dictionaries and HashMaps/Hashtables are similar but yet different and the language doesn’t use those terms either.

dict_empty = {} # Create an empty dictionary
 
dict_ints = { 1: True, 2: False } # Dictionary that uses ints as keys and stores booleans
 
dict_strings = { 'One': 'Malphite', 'Two': 'Graves' } # Keys and values are both strings
 
dict_strings['Three'] = 'Ahri' # Adding an element to the dictionary
 
sample = {'Four': 'Fiora', 'Five': 'Jinx' } 
dict_strings.update(sample) # Merging dictionaries
 
dict_strings.pop('Two', None) # Delete key Two and return None if key doesn't exist
 
del dict_strings['One'] # Will remove key One and raise KeyError if key doesn't exist
 
print(dict_strings['Three'])
 
hash_set = { 'Xayah', 'Rakan' } # To create a Hashset just use the same syntax but don't provide the values

Java

Links

https://www.youtube.com/watch?v=RcZsTI5h0kg Best explanation: https://www.youtube.com/watch?v=FsfRsGFHuv4 Difference HashMap/Hashtable in Java specifically: https://chat.openai.com/share/80bdd708-d3e7-4347-b118-8cb941ae23e8

🦤 DoDoFarm

Explorer

Hashtables and Hashmaps and Hashsets

What is a Hashtable/HashMap?

HashSets

Time Consumption & Avoiding collisions

Python

Java

Links

Graph View

Table of Contents

Backlinks