on graph theory

$ stat ./writing/on-graph-theory.md

File: "on-graph-theory"

Size: 1018 words

Modify: 5/1/2026

on graph theory

there is a moment in most CS curricula where graph theory gets introduced as a unit. a few weeks, some algorithms: BFS, DFS, Dijkstra, maybe Bellman-Ford if the professor is ambitious; and then you move on. i did this. everyone does this.

the problem is that framing graph theory as a topic fundamentally undersells what it is. graph theory is not a topic. it is a lens.


what a graph actually is

a graph G is a pair (V, E) := a set of vertices and a set of edges connecting them. that's it. two sets. the entire field descends from this.

the staggering thing is how many systems collapse cleanly into this definition:

  • a Git repository is a directed acyclic graph. commits are vertices, parent pointers are edges. branching is just creating a new vertex that shares a parent. merging is a vertex with two incoming edges. the entire version control model is graph traversal with named pointers.

  • a TCP network is a weighted graph. routers are vertices, links are edges, latency or bandwidth is the weight. routing is shortest-path. congestion is a capacity constraint. BBR's model of bandwidth and RTT is just a cleaner way of reading the same graph.

  • a neural network is a directed weighted graph. neurons are vertices, synaptic connections are edges, weights are edge values. forward propagation is a graph traversal. backpropagation is computing gradients over that traversal in reverse, the chain rule is just the product of edge weights along a path.

  • Minecraft's chunk system is an implicit grid graph. each chunk has up to eight neighbors. pathfinding mobs run BFS or A* over this graph. the Java GC pressure I noticed as a kid was partially the cost of maintaining this graph in memory while the world loaded.

none of these systems were designed using graph theory explicitly. they converged to it because graphs are what naturally emerges when you model relationships between discrete things. the math preceded the engineering by decades.


the thing i keep returning to: connectivity

of all the properties a graph can have, connectivity is the one i find myself thinking about most often.

a graph is connected if there exists a path between every pair of vertices. this sounds simple. the implications are not.

the connected components of a graph partition its vertices into maximal connected subgraphs. in a social network, components are isolated communities. in a filesystem, a disconnected component is an orphaned inode, data with no path from the root. in a distributed system, a network partition is a graph becoming disconnected mid-operation.

the cut vertices (articulation points) of a graph are the vertices whose removal disconnects it. these are single points of failure, the nodes that, if they go down, split the network. identifying them is not just a graph theory exercise; it is literally what network reliability engineers do.

bridges are the edge equivalent, edges whose removal disconnects the graph. in infrastructure, a bridge in the graph sense is a bridge in the literal sense: one road, one cable, one link, and if it fails, two halves of a system can no longer communicate.


trees are graphs too

a tree is a connected acyclic graph. no cycles, fully connected. a forest is a collection of trees, a disconnected acyclic graph.

the spanning tree of a graph G is a subgraph that is a tree and contains every vertex of G. the minimum spanning tree (MST) uses the minimum total edge weight. Kruskal's and Prim's algorithms find it in O(E log E) and O(E log V) respectively.

MSTs appear constantly in practice:

  • network design (minimum cable to connect all nodes)
  • cluster analysis in machine learning (single-linkage clustering is MST-based)
  • image segmentation (graph-cut methods build on MST ideas)

but the reason i find trees conceptually important is simpler than any algorithm. a tree is the minimal connected graph, exactly enough edges to keep everything reachable, with nothing redundant. it is the skeleton of connectivity. everything else a graph can be is a tree plus extra edges, and those extra edges are where cycles, redundancy, and robustness come from.


planarity and why it matters

a graph is planar if it can be drawn on a flat surface with no edge crossings.

Kuratowski's theorem gives the exact condition: a graph is planar if and only if it contains no subdivision of K₅ (the complete graph on 5 vertices) or K₃,₃ (the complete bipartite graph on 3+3 vertices) as a subgraph.

this matters for circuit board design, a PCB is planar by necessity, because traces cannot cross without a via (a physical layer jump). VLSI design is fundamentally a graph planarity problem. the number of layers in a chip is directly related to how non-planar its connection graph is.

the four color theorem, that any planar map can be colored with four colors such that no two adjacent regions share a color, is a graph theory result about planar graphs. it took over a century to prove (1976, Appel and Haken), and the proof was the first major theorem verified partly by computer, which caused its own philosophical controversy about what "proof" means.


where i want to go next

i have been reading about spectral graph theory, the study of graphs through the eigenvalues and eigenvectors of their associated matrices (the adjacency matrix, the Laplacian). the eigenvalues of the Laplacian encode connectivity properties directly: the number of zero eigenvalues equals the number of connected components. the second-smallest eigenvalue (the algebraic connectivity or Fiedler value) measures how well-connected a graph is.

this bridges into machine learning in a direction i find compelling: graph neural networks (GNNs) operate directly on graph structure, and spectral methods underpin how information propagates through them. the connection between the graph Laplacian and diffusion processes, how information spreads across a network, is something i want to understand more precisely.

the mathematics of relationships turns out to contain the mathematics of information flow, which contains the mathematics of learning. i do not think this is a coincidence.

~
~
<EOF>