The coffee shop trick
Queue at a coffee shop. The customer at the front pays, takes their drink, and leaves. The rest of the line does not physically shuffle forward by one step — the barista's attention moves, but nobody walks. Now open your browser console and write a queue in JavaScript: q.push(x) to add, q.shift() to take from the front. You expect both operations to be O(1). Half right. shift() re-indexes every remaining element — the line does shuffle forward, every single time. A million enqueues followed by a million dequeues is O(n²). Coffee shops solved this centuries ago. Most queue code in production hasn't.
A queue is two pointers, not a moving line
The interface of a queue is FIFO — first in, first out. Add at the back, remove from the front. The hard part is keeping both ends cheap, because the obvious implementation makes one of them quietly expensive.
Watch what happens when we do it the wrong way and the right way side-by-side.
The visualiser above renders two panels stacked vertically. The top panel is the naive queue — a JavaScript array with push and shift. The bottom panel is the ring buffer — a fixed array with two indices, head and tail, that walk forward and wrap around. Drive enqueue('A'), then enqueue('B'), then enqueue('C'), then dequeue(). The first three operations look identical. The fourth is where everything diverges.
In the naive panel, dequeueing A causes B and C to physically slide one slot to the left. The state panel's shifts counter ticks up to 2. In the ring panel, dequeueing A does nothing of the sort — the head arrow advances from slot 0 to slot 1, and B and C stay exactly where they are. The shifts counter for the ring panel stays at 0.
That gap is the lesson. Both panels did the same logical work. Only one of them paid for it.
A `NaiveQueue` (built on `Array.prototype.shift`) holds 1,000,000 elements. You call `dequeue()` 1,000,000 times until it is empty. Roughly how many element-shift operations does the CPU perform in total?
If you said anything other than ~500 billion, the trap is doing its job. The textbook claim "removing from the front of an array is O(1)" silently assumes a ring buffer or a deque, not Array.prototype.shift. In every mainstream language — JavaScript, Python, Java's ArrayList, C++'s std::vector — removing the first element of an array reindexes everything after it. A FIFO built on top of that is doing two orders of magnitude more work than it advertises.
The ring buffer: the production answer
The ring buffer fixes the trap with one observation. The line at the coffee shop does not move because no customer's position changes when the front leaves — only the barista's attention changes. Translate that into code: keep the elements where they are, and move two integers instead. One integer (head) tracks where the next dequeue reads from. Another (tail) tracks where the next enqueue writes to. Both walk forward through a fixed array. Neither value ever causes physical movement of any element.
class RingQueue<T> {
buf: (T | undefined)[];
head = 0;
tail = 0;
count = 0;
constructor(public cap: number) {
this.buf = new Array(cap);
}
enqueue(v: T): boolean {
if (this.count === this.cap) return false; // full
this.buf[this.tail] = v;
this.tail = (this.tail + 1) % this.cap;
this.count++;
return true;
}
dequeue(): T | undefined {
if (this.count === 0) return undefined; // empty
const v = this.buf[this.head];
this.buf[this.head] = undefined;
this.head = (this.head + 1) % this.cap;
this.count--;
return v;
}
}
Eight slots, two indices, one counter. That is the entire data structure. Every operation is constant work — one bounds check, one read or write, one modulo, one decrement or increment. Nothing scales with the number of elements in the queue.
The modulo is what makes the array a ring. When tail reaches the last slot and the next enqueue arrives, (tail + 1) % cap evaluates to 0 instead of cap. The arrow visibly jumps from the right edge of the buffer back to slot 0. The same thing happens to head on the dequeue side. Once both indices have wrapped, the live elements occupy a contiguous span that crosses the array boundary — physically split, logically continuous.
Current ring state: `capacity = 8`, `head = 5`, `tail = 7`, `count = 2`. Slot 5 holds `B`, slot 6 holds `C`. You call `enqueue('X')`. Which of these matches the state immediately after the call completes?
The mental rule for enqueue is: write at tail, then advance tail modulo capacity. If you mix up the order — advance first, then write — the next enqueue overwrites the value you wrote. Order matters; it is not interchangeable.
Ring state: `capacity = 8`, `head = 5`, `tail = 2`, `count = 5`. You call `enqueue('Y')`. What is the new value of `tail` after the call completes?
Once you have the formula, the rest is mechanical. Every wrap is the same arithmetic. Every advance is one modulo away from being correct.
When head == tail, who's right?
The ring buffer has one ambiguity that bites every implementer once. After enough enqueues with no dequeues, tail walks all the way around the buffer and meets head again. After enough dequeues with no enqueues, head walks all the way around and meets tail. In both cases, head == tail. One state means full; the other means empty. The two integers alone cannot tell you which.
Starting state: `capacity = 4`, `head = 0`, `tail = 0`, `count = 0` (empty). Apply this sequence: `enqueue('A')`, `enqueue('B')`, `enqueue('C')`, `enqueue('D')`. After all four operations, what does the ring look like?
The conventional fix is the third field in the implementation above: count. Whenever count == capacity, the ring is full. Whenever count == 0, the ring is empty. The head == tail overlap loses its ambiguity because count settles the question — and it costs one integer per queue, plus one increment and one decrement per operation.
The other fix some implementations use is to deliberately never fill the buffer completely — leave one slot perpetually empty as a sentinel. Then head == tail always means empty, and (tail + 1) % cap == head always means full. This trades one slot of capacity for the disambiguation, which is occasionally worth it on memory-constrained hardware. For application code, the explicit count is almost always cleaner.
The order is the algorithm
Ring buffer code is short, but every line has a reason. Get the order wrong and the invariant breaks.
Drag these four operations into the correct order for one `enqueue(v)` call on a non-full ring buffer.
- 1.if (this.count === this.cap) return false
- 2.this.buf[this.tail] = v
- 3.this.tail = (this.tail + 1) % this.cap
- 4.this.count++
Bounds-check first, because writing into a full buffer would corrupt a live element. Write second, because you need the slot value present before you advance past it. Advance third, because the next enqueue depends on tail pointing at the next free slot. Increment count last, because it confirms the operation succeeded. Reorder these and the queue stops being a queue.
The bug to watch for is more subtle than line-ordering, though. It hides in the fullness check itself.
Ring state: `capacity = 4`, `head = 1`, `tail = 1`, `count = 4`. Slots hold `[D, A, B, C]` — the live region wraps from slot 1 around through slot 0. Here is a four-step trace claiming to enqueue `X`. One step is wrong. Click it.
The head !== tail shortcut is wrong precisely because head == tail is ambiguous. A bounds check that uses pointer equality will let an enqueue proceed into a full ring, overwriting a live element that hasn't been dequeued yet. The fix is to check count, not pointer positions. This is the canonical ring-buffer bug and it ships in production code with surprising regularity — usually written by someone who reasoned about the empty case (where head == tail does mean empty), and assumed the full case worked the same way by symmetry. It doesn't.
Where this shows up in real systems
Ring buffers are everywhere in systems code. Three places worth knowing about, each operating at a wildly different scale:
The Linux kernel ships a generic ring-buffer implementation called kfifo in include/linux/kfifo.h. It is used everywhere a FIFO is needed inside the kernel — TTY drivers, USB transfer queues, the perf event ring, audio sample buffers. The implementation requires capacities that are powers of two, which lets it replace the modulo with a bitmask (& (cap - 1)) and avoid a division per operation. That single-instruction wrap is the difference between a ring buffer that runs at full memory bandwidth and one that doesn't.
The LMAX Disruptor is a lock-free ring buffer originally written for high-frequency trading at the LMAX exchange. Martin Thompson's 2011 talk demonstrated 6 million ops per second on a single thread, with no garbage-collection pressure. The trick was a ring buffer plus cache-line-padded sequence numbers plus careful memory ordering. Lock-free is a topic of its own, but the structural skeleton is the same one the visualiser shows above: two indices, a fixed array, and modulo arithmetic.
Go channels are the language-level FIFO primitive, and every Go channel is a bounded ring buffer. The runtime defines hchan in runtime/chan.go as a struct holding buf (the ring), sendx (tail), recvx (head), and a count. Goroutines that find the channel full or empty are themselves queued in linked lists hanging off the same struct, so you end up with a ring buffer of values plus two intrusive linked lists of waiters. A channel is, internally, two queue implementations layered.
There is one place where you almost certainly don't want a queue, even though the problem looks like FIFO:
You are exploring a maze level by level — visit all squares 1 step away, then all squares 2 steps away, then all 3 steps away — to find the shortest path to the exit. The frontier of 'discovered but not yet explored' squares lives in some structure. Which structure does this algorithm need?
The reverse is also worth holding on to: stacks turn breadth-first into depth-first. The two structures are not interchangeable. They are mirror images of each other, and the choice between them is the choice between two different algorithms. This is why the graph-traversals lesson opens with "BFS uses a queue, DFS uses a stack" as a single sentence — the queue isn't an implementation detail of BFS, it is BFS.
When the naive version is fine (and when it isn't)
The honest story: array.shift() is fine when the queue is small and the throughput doesn't matter. A queue of 50 pending UI events being processed once per frame, drained one at a time? shift() is invisible in the profiler. A queue of 50 million log entries being drained at 100k entries per second? shift() is the entire profiler.
The threshold where naive becomes fatal varies by language and workload, but the rule of thumb is: as soon as the queue grows past a few thousand elements, or the dequeue rate exceeds about 10k per second, the O(n) per dequeue becomes a measurable cost. Below that, optimise readability instead. Above it, reach for a ring buffer immediately — Deque in Java, collections.deque in Python, a hand-rolled RingQueue in TypeScript, or your language's equivalent.
The other trade-off worth flagging: ring buffers are bounded by construction. You decide the capacity at allocation time, and once it fills, you have to make a choice — block the producer, drop the oldest, drop the newest, or grow the buffer (which involves copying everything to a larger array, exactly like Array.prototype.push does on resize). Unbounded queues sound safer because they never reject, but they're the textbook way to turn a slow consumer into an out-of-memory crash. A bounded queue forces you to make the backpressure decision at design time, which is where it belongs.
What to learn next
- Deques — the same ring-buffer trick extended to push and pop at both ends. Used by Python's
collections.deque, Java'sArrayDeque, and the sliding-window patterns in the intermediate tier. - Heaps and priority queues — what "queue" means when the dequeue order isn't FIFO but "smallest first". A heap is structurally a tree, not a ring, and the operations are O(log n) rather than O(1).
- BFS and DFS — where the FIFO ordering of a queue stops being implementation detail and starts being algorithm. The reason BFS finds shortest paths in unweighted graphs is exactly the queue's
head < tailordering.