Hash Maps — Algorithm Learning Platform

The username that's already taken

You're signing up for a service. You type a username, tab out of the field, and before your finger leaves the key the page already knows whether it's taken. Fifty crore other usernames sit in some database, and the answer came back in a few milliseconds. Nothing scanned. No log-of-n binary search either. There's one calculation behind that answer — and the rest of this lesson is about what that calculation is.

A hash map is an array you index with a string

The trick is the same one arrays already taught: turn a question about position into one address calculation. Arrays do this with an integer index — address = base + i × size. Hash maps add one extra step in front: turn the key (a string, an object, anything) into an integer first.

hash(key)            → some integer
hash(key) % capacity → an array index

That's the whole structure. The "array" here is called a bucket array. The integer that comes out of hash(key) is what lets you skip straight to a bucket. Everything else — chains, resizes, load factors — is bookkeeping for what to do when two keys land on the same bucket.

The visualiser starts with eight empty buckets and a deliberately tiny hash function: sum the character codes of the key, take it mod 8. Step through set("ace", 1) and set("bee", 2) and watch the hash strip resolve, then drop into the bucket array. Two operations, two buckets, no collisions yet.

Hash function is sum of char codes, mod 8. Char codes: c=99, a=97, b=98. Which bucket does "cab" land in?

If you got that on the first try, you've already grasped the spine of the topic. Hashing is doing the heavy lifting; the bucket array is just a place for the answer to land.

What happens when two keys want the same bucket

Look at what set("ace", 1) and set("eca", 4) have in common. Same characters, same sum: 97 + 99 + 101 = 297. Same bucket: 297 mod 8 = 1. The bucket already holds ("ace", 1) when ("eca", 4) arrives. Watch the visualiser hand this case.

Bucket 1 already holds ("ace", 1). We're inserting ("eca", 4), which also hashes to bucket 1. Which picture matches the bucket immediately after this insert completes?

Both entries stay. The bucket holds a chain — a tiny list of entries that hashed to the same slot. get("eca") re-runs the hash, lands on bucket 1, then walks the chain until the keys match. get("ace") does the same walk but stops on the first entry. The hash gets you to the bucket; the chain walk finds the entry inside the bucket.

This is the moment most learners flatten. They've been told "hash maps are O(1)" and assume the lookup is one step. It is one step on average — when chains are short. It is not one step in the worst case.

We just walked a chain of 2 entries. Imagine a worst-case map where every one of the 1,000 keys hashed to the same bucket. How many comparisons does get need in the worst case?

Up to 1,000 — the lookup walks the whole chain1 — hash maps are O(1)About 10 — log n0 — the hash takes you straight to the answer

The "1,000 in one bucket" case is not theoretical. In 2011, researchers showed that web servers running PHP, Java, ASP.NET, and Python were all vulnerable to algorithmic complexity attacks: an attacker submitting form data with strategically chosen keys could force a single POST request to pin a CPU. The fix industry-wide was to randomise the hash seed per process, so an attacker can't know in advance which keys will collide. That's why Python ≥ 3.3 randomises its string hashes, and why Java's HashMap falls back to a red-black tree once a bucket has 8 collisions: a worst-case O(n) chain walk became unacceptable at internet scale.

The order of things matters

Walking a chain is searching. Hashing is computing a slot. The order in which these happen is what makes hash maps fast — and getting the order wrong is one way learners convince themselves that hash maps are "secretly just lists."

Drag these four steps into the order the lookup happens for get("eca") on a chained hash map.

1.Compare key in bucket entry to "eca"
2.Compute hash("eca")
3.Read the bucket array at index hash mod capacity
4.Take hash mod capacity to get a bucket index

Notice what step is missing from that order: scanning the bucket array. The hash takes you straight to the right bucket. You never look at any other bucket. That is the entire reason this structure beats a list — and the entire reason it falls apart if every key happens to hash to the same place.

Load factor and the resize that nobody pictures

Inserts are cheap on average. They become expensive only at one specific moment: when the bucket array gets too full and the table doubles. "Too full" has a precise meaning here. The load factor is the ratio of entries to buckets, and most implementations trigger a resize when it crosses about 0.75.

function set(map, key, value) {
  const i = hash(key) % map.capacity;
  // ... insert into chain ...
  if (map.size / map.capacity > 0.75) resize(map);
}

When resize fires, the table allocates a new bucket array — usually double the size — and rehashes every existing entry into the new array. Same keys, fresh hash(key) % newCapacity. Some entries land in the same numeric bucket; others land far away. The old array gets thrown away.

The map currently has 7 entries in 8 buckets. The next set crosses the 0.75 load factor threshold and triggers a resize. After the insert and resize complete, what are the new capacity and load factor?

Capacity 16, load factor 0.50Capacity 9, load factor 0.89Capacity 16, load factor 0.875Capacity 16, load factor 0.00

This is the cost most "O(1)" framings hide. Inserting one entry can sometimes mean rehashing the previous N entries. The math works out to amortised O(1) per insert, because doubling makes the costly resizes rare enough — but if you're allocating a fresh map inside a hot loop and inserting a million keys, you'll trigger ~20 resizes during the fill. Pre-sizing the map (new HashMap<>(expectedSize) in Java, make(map[K]V, n) in Go) is one of the cheapest performance fixes in the language.

The bug that makes entries vanish

Hash maps assume the keys hold still. If a key's hash changes after it has been inserted, the entry is still in the bucket array — but the lookup goes looking in the wrong bucket and reports the entry missing. The map is silently corrupted.

A developer is using a map keyed by user objects. They're confused that map.get(user) returns undefined even though they just put user in. Which line is the bug?

This is why Python forbids mutable types as dictionary keys at the language level: lists, dicts, and sets are unhashable. You have to convert to a tuple or freeze the set first. Java trusts you instead — if you override equals and hashCode, it's your job not to mutate the relevant fields after using the object as a key. Most production Java bugs around HashMap trace back to violating this contract.

When the hash map is the wrong answer

Hash maps are almost always the right structure for keyed lookup. They become the wrong structure the moment you need order.

For which of these workloads is a hash map the worst fit?

Find every word in the dictionary that starts with "pre-"Look up a user's permissions by user ID, 50,000 times a secondCount the frequency of each word in a 1 GB text fileDetect duplicate email addresses in a registration form

Range queries, sorted iteration, finding the smallest or largest key, prefix searches — these all need a structure that preserves order. Hashing destroys order. That's why your database's primary key is usually a B-tree (orderable, range-queryable) and not a hash table, even though the hash table would lose less time per individual lookup.

The other case where hash maps lose: when keys are dense small integers (0..n). A plain array of length n is faster, uses less memory, and never hits collisions or resizes. Reach for a hash map when keys are sparse, irregular, or non-integer.

Where this shows up in real systems

The structure is so useful that almost every major runtime ships its own carefully tuned variant.

Python dict: open addressing with perturbation probing. Source lives in Objects/dictobject.c. Insertion order has been preserved since 3.6 (3.7 made it a language guarantee), achieved by storing entries in a dense array and indices in a separate sparse table — a layout designed by Raymond Hettinger that also cut dict memory by about 25%.
Java HashMap: separate chaining like the visualiser, with one twist. When a single bucket grows past 8 entries with the same hash, the chain converts to a red-black tree (TREEIFY_THRESHOLD = 8). This caps worst-case lookup at O(log n) even when the hash function is misbehaving or under deliberate attack.
Go map: a bucket array where each bucket holds 8 key/value slots, with overflow buckets chained on. Iteration order is deliberately randomised on each range to prevent code from accidentally depending on it. Go 1.24 introduced Swiss-table-based maps for better cache behaviour.
Rust HashMap: backed by hashbrown, a port of Google's Abseil Swiss tables. Uses SIMD to scan a control byte array, finding matches in one or two cache lines. The default hasher is SipHash-1-3 (DoS-resistant); you can swap in FxHash or aHash for non-adversarial workloads.
Redis hashes (HSET / HGET): incremental rehashing. Instead of stopping the world to rehash a 10-crore-entry hash, Redis keeps both old and new tables alive and migrates a few buckets per command until the old table is empty. Critical because Redis is single-threaded — a synchronous resize would freeze every other client.
Cassandra / Kafka partitioning: Murmur3 hashes a row key into a 64-bit integer that maps to a partition. Murmur3 is non-cryptographic — fast and well-distributed — exactly what you want for partitioning, and badly broken for security.

The username-availability check at the top of this lesson is doing something close to the simplest version of this. Hash the candidate username. Look in the bucket. Walk the chain (usually empty, sometimes one entry, rarely more). Return yes or no. Fifty crore entries don't matter; the cost is the hash and the chain walk, not the count.

What to learn next

Hash sets — the same structure with the value field thrown away. Membership testing in O(1).
Sliding window (variable) — the first pattern that combines arrays with a hash map for counting.
Bloom filters — what you reach for when even a hash map is too expensive in space.