Trees (general) — Algorithm Learning Platform

A path through a hierarchy

Type cd ~/Documents/Projects/elearning into a terminal. Each / is one step from a parent directory into one of its children. Your shell follows the path the same way the previous lesson's linked-list cursor followed next pointers — except now each node has many nexts, and the shell has to pick the right one. Strip the slashes and what remains is the entire data structure: nodes, children, a root somewhere up top. Filesystems, the DOM, your React component tree, the AST your TypeScript compiler walks — all the same shape.

A node is a value and a list of children

A linked-list node held one next pointer. A tree node holds many — usually as an array.

type TreeNode<T> = {
  value: T;
  children: TreeNode<T>[];
};

Nothing else. The tree itself is a single pointer — root — that knows where the top node lives. Everything else is reached by walking down through children arrays. There is no index, no formula, no parent field on the node. Just values and arrows fanning out.

This one change — replacing the single next with an array of children — is enough to model every hierarchy you will meet for the rest of your career. Files in folders. Elements in elements. Components in components. Statements in functions in modules. Once the shape clicks, you stop seeing trees as a textbook chapter and start seeing them as the default way data is shaped in the wild.

The visualiser builds a small filesystem-like tree above. The root is ~; it has three children; one of those has its own children; one of those has its own. Eight nodes total. Watch the build animation: addChild(parent, value) is the one primitive — given a parent and a value, the child slots into the parent's children array in constant time. Just like prepending to a linked list, the new node never has to "find its place." The parent already knows where its children live; pushing one more is a single array write.

The vocabulary you cannot avoid

Every conversation about trees uses six words. The visualiser pauses on the built tree and labels them on the picture, but they're worth naming directly.

Root: the node with no parent. There is exactly one.
Leaf: a node with no children. There can be many.
Internal node: any non-leaf node — the root counts if it has at least one child.
Parent / child: an arrow from A down to B makes A the parent of B, and B a child of A.
Subtree: pick any node, mentally cut the arrow above it. What's left below is itself a tree, with that node as its root. This is the recursive bit, and it's where every algorithm in this lesson comes from.
Depth: the number of arrows from the root down to a node. The root has depth 0; every step away adds 1.
Height: the number of arrows from a node down to its deepest leaf. Leaves have height 0; the whole tree's height is the height of the root.

Depth and height feel similar and are routinely confused — but they are measured from opposite ends. A leaf's depth depends on the whole tree above it; its height is always 0. The root's depth is always 0; its height depends on the whole tree below it.

The leaf `elearning` is labelled `depth = 3`. What is `depth = 3` measuring?

The number of arrows on the path from the root down to this node.The number of children this node has.The height of the whole tree.The position of this node in its parent's `children` array.

Depth grows as you walk away from the root. Height grows as you walk toward the deepest leaf. Both are integers, both count edges (not nodes), and you will mix them up at least once before the end of this page.

Walking a tree, two ways

Trees are useless without a way to visit every node. There are two natural orders, and the difference is one data structure.

The first is depth-first search — go as deep as possible into the first child, then back up, then into the next child, and so on. The shape of the recursion mirrors the path from root to current node:

function dfs(node) {
  if (node === null) return;
  visit(node);
  for (const child of node.children) {
    dfs(child);
  }
}

Six lines. The recursion is the structure: dfs(node) handles the current node, then defers the rest of the tree to dfs(child) calls. Every tree algorithm you will write for the next twenty topics has this shape. Memorise it once, vary the body.

In the visualiser above, watching pre-order DFS run is also watching the call stack grow and shrink. Each dfs(child) call layers a frame on top of the current one; each return pops a frame. The stack is, at any moment, the path from the root to whichever node we're standing on. When the cursor reaches the deepest leaf, the stack is at peak height — exactly the height of the tree.

DFS pre-order is partway through. The cursor has just stepped into `Projects`. What does the call stack look like at this exact moment, bottom to top?

The stack reaches dfs(elearning) only because dfs(~), dfs(Documents), and dfs(Projects) are still open above it, waiting for their child loops to finish. Recursion layers; it does not replace.

There are two flavours of DFS in everyday use, and the only difference is when the visit happens.

// Pre-order — visit, then recurse
function dfsPre(node) {
  visit(node);
  for (const c of node.children) dfsPre(c);
}

// Post-order — recurse, then visit
function dfsPost(node) {
  for (const c of node.children) dfsPost(c);
  visit(node);
}

Pre-order prints the node on the way down; post-order prints it on the way back up, after every descendant has been seen. The post-order shape is exactly what rm -rf does on a directory: delete the children before the parent, otherwise the parent is gone before its children are reachable.

For the subtree `Documents → [Projects → [elearning, analytics], notes.md]`, drag the nodes into the order a post-order DFS visits them.

1.Documents
2.Projects
3.elearning
4.analytics
5.notes.md

The other walk replaces the call stack with a queue.

function bfs(root) {
  const queue = [root];
  while (queue.length > 0) {
    const node = queue.shift();
    visit(node);
    for (const child of node.children) {
      queue.push(child);
    }
  }
}

This is breadth-first search, and the only thing that changed is the data structure on the side. A stack (implicit, in DFS's recursion) gives you depth-first; a queue gives you level-order — every node at depth d visited before any node at depth d+1. Same picture, different scratchpad. The visualiser swaps the call-stack panel for a queue panel and runs the same eight nodes; the order changes completely.

The choice between DFS and BFS is almost always about which node you want to find first. Looking for the deepest match? DFS goes there fastest. Looking for the shallowest? BFS gets there first.

Recursion as the default shape

Every non-trivial tree algorithm has the same skeleton: handle the current node, recurse into the children, combine the results. Counting nodes, summing values, finding the maximum, computing the height — all of them collapse into the same six lines, with one detail changed.

Height is the canonical example.

function height(node) {
  if (node.children.length === 0) return 0;
  let max = -1;
  for (const child of node.children) {
    max = Math.max(max, height(child));
  }
  return 1 + max;
}

The base case is a leaf — height 0, no recursion. The recursive case computes each child's height, takes the max, adds 1. That's the entire idea. Watch the visualiser run it: leaves resolve to 0 first, then their parents bubble up to 1, then the next level to 2, then the root to 3. The recursion unwinds bottom-to-top, and each node's annotation lights up as its return value becomes known.

A tree with exactly one node — just the root, no children — has height = ?

A one-node tree's height is 0 because height counts arrows, not nodes — and a single node has no arrows under it. The base case node.children.length === 0 returns 0 directly. The convention height(null) = -1 exists for empty trees, where no node exists at all; that is a different case from a tree containing one node.

The same six-line shape — recurse into children, combine, return up — is what computes size (replace Math.max with +, return 1 + sum), depth from a target (track distance, return early on hit), or "does any descendant satisfy this predicate?" (return true if any child does). The pattern is more general than any individual algorithm.

What a tree is not

Trees are easy to draw, and that is half the trouble — every shape that looks tree-like is not actually one. Three failure modes worth naming.

A cycle turns a tree into a graph. Adding an arrow from a descendant back to an ancestor — even a single one — breaks the "exactly one path between any two nodes" invariant. Every recursive algorithm in the previous section becomes an infinite loop. The DOM technically allows you to attach the same element as a child of two different parents, and browsers go to some lengths to detect and prevent it; that's not arbitrary fussiness, it's the cycle invariant being defended.

A disconnected structure is also not a tree — every node must be reachable from the root, and there is exactly one root. Two separate trees together are a forest, which is a different (still useful) thing.

A degenerate tree — every node has exactly one child — is technically a tree but useless: it's a linked list wearing a tree's clothes. Every operation that should have been logarithmic in a balanced tree becomes linear. This is what a binary search tree turns into if you insert sorted data without rebalancing, and it's why entire later topics exist (AVL, red-black) just to keep trees balanced.

Four small node-and-arrow diagrams are described below. Three are valid trees. Which one is not a tree?

Root R has children A, B, C, and C also has an arrow back to R.A single isolated node with no children.A root with three children, each of which is a leaf.A vertical chain: root → child → child → child.

The thing to internalise: "tree" is not a shape. It's a set of invariants — connected, acyclic, one root, one parent per non-root. Any structure that violates them is something else, even if it's drawn the same way.

DFS or BFS — which one when?

The difference between depth-first and breadth-first is small in code and large in behaviour. The right choice depends on what you're looking for and where.

Two tasks: (1) print every file under `~/Documents/`, deepest paths first — children before their containing folder. (2) Find the *shallowest* node in the tree whose value matches a target. Which traversal fits each?

(1) DFS post-order. (2) BFS.(1) DFS pre-order. (2) BFS.(1) DFS post-order. (2) DFS (any order).(1) BFS. (2) BFS.

The two patterns:

DFS descends into the first child fully before touching the second. It naturally visits leaves before siblings further along. Use it when "deepest" or "exhaustive in this subtree" is what you want — listing every file under a directory, computing aggregates that need every descendant resolved (height, size, sum), or running any depth-first algorithm that the next twenty topics will be variations on.
BFS sweeps each depth level completely before going deeper. Use it when "shallowest" or "closest to the root" is what matters — the shortest path to a target node, the first match at a known minimum depth, or any "spread out from a starting point" shape (which becomes Dijkstra and friends in the graph chapters).

Both are O(n) in time. The space costs differ: DFS uses O(h) for the call stack (or an explicit stack), where h is the height of the tree; BFS uses O(w) for the queue, where w is the maximum width of any single level. On a balanced binary-ish tree, w can be much larger than h, so DFS is usually lighter on memory. On a thin-and-deep tree, BFS can be lighter. Pick based on what you're searching for first; the memory difference is the tiebreaker.

Where the bugs hide

Tree code has its own family of bugs, distinct from the off-by-ones of array code and the null-deref family of linked-list code. The single most common one comes from copy-pasting binary-tree code into a general-tree context.

Here is a candidate `height` function for a general tree. One thing about it is wrong — it would work for a binary tree, but not the n-ary trees this lesson is about. Click the buggy step.

Every textbook's first tree code is a binary tree, and a binary-tree node has node.left and node.right instead of node.children. A learner who pattern-matches on that shape will write Math.max(height(node.left), height(node.right)) for a general tree — and silently miss every child past the second. The fix is to iterate node.children and reduce, never to assume two of anything.

The other tree-bug families worth naming:

Stack overflow on deep recursion: a tree of depth 10⁵ in a language with a default 1 MB stack will crash a recursive traversal. Symptom: RangeError: Maximum call stack size exceeded in JavaScript, RecursionError in Python. Fix: convert to iterative DFS with an explicit stack — bfs() already shows the shape; just swap the queue for a stack.
Accidental cycles on mutation: setting child.children.push(ancestor) turns the tree into a graph. The next traversal never returns. Defensive fix: in any mutation API, walk up the parent chain and reject the insertion if the new child is already an ancestor. (This requires parent pointers, which is one reason real tree libraries keep them.)
Lost subtree on rewire: replacing parent.children[i] with a new node without first detaching the old subtree leaks memory in non-GC languages and silently disconnects nodes everywhere else. Same family as "losing head" in linked lists.

Where this shows up in real systems

Trees are everywhere data has hierarchy, which means everywhere. Three places worth knowing:

Filesystems — ext4, btrfs, NTFS, APFS, the Mac you may be reading this on. The directory hierarchy is a general tree of inodes; / is the root, every directory is an internal node, every regular file is a leaf. Hard links and bind mounts technically turn it into a directed acyclic graph, but the abstraction every command-line tool and every program operates on is a tree. Every cd, every ls, every find walks it.
The DOM — the HTML you render every day is a tree. document is the root; <html> is its only child; <head> and <body> are siblings; everything else descends from one of those. Browsers expose both the array-of-children shape (childNodes) and a first-child / next-sibling pair of pointers (firstChild, nextSibling) — different layouts for different traversal patterns. React's reconciler walks a tree of fibre nodes that mirrors this exact shape.
Compiler ASTs — every parser you have ever used (Babel, the TypeScript compiler, GCC, Clang, rustc) builds a general tree from your source code, then transforms it into another tree, then another. The whole compilation pipeline is one tree-to-tree transform after another. The ESTree spec exists to standardise the shape of one such tree; if you have ever written a Babel plugin you have written a tree visitor.

The structure shows up so often that the question is rarely "should I use a tree here?" — it's "what kind of tree?" Sorted? Balanced? Binary? Flexible-degree? The next several lessons answer those one at a time.

What to learn next

Binary trees — the special case where every node has at most two children. Almost every algorithm textbook lives here, for good reason.
Binary search trees — what happens when you add a sorted-order invariant to a binary tree. The first place "use a tree" beats "use an array" for some real workload.
Heaps — a different invariant on a binary tree, and the data structure that makes priority queues fast.