Who Can Understand the Proof? * RF SAFE® Radio Frequency Safe

A Window on Formalized Mathematics

The Simplest Axiom for Logic

The Axiom

More than a century ago, mathematicians started wondering: “How simple could the axioms for logic really be?” We all know that Boolean algebra—at the core of logic—is intimately tied to the fundamental ways we reason and compute. Over the decades, shorter and shorter axioms for Boolean algebra were discovered. But in January 2000, Stephen Wolfram found the absolute simplest one:

$c\bigl( (a \cdot b) \cdot c \bigr) \; \cdot \; \bigl( a \cdot \bigl( (a \cdot c) \cdot a \bigr) \bigr) \;=\; c$ where the operation “ $⋅\cdot$ ” can be interpreted as the logical NAND (or NOR). With the help of an automated theorem prover (specifically what’s now FindEquationalProof in the Wolfram Language), Wolfram demonstrated that this single axiom is a complete axiomatization of Boolean algebra—and moreover the simplest possible.

In the quarter century since, no one has come up with a “human friendly” proof. The theorem’s truth is well established (the software’s derivation is correct); yet the proof itself, as we have it, is sprawling and obscure. It seems to stand as a stark lesson in the possible limits of human “conceptual reach” when faced with the raw “machine code” of formal derivations.

Why the Theorem Matters

The axiom

$\cdot b) \cdot c) \cdot \bigl(a \cdot ((a \cdot c) \cdot a)\bigr) = c$ reduces the fundamental logic of True/False to a single ultra-compact law. That’s a big conceptual surprise. For years, mathematicians had considered simpler and simpler axiomatic bases. But each time people tried to push further, they ran into dead-ends or axioms that still required multiple lines. What Wolfram discovered is that the entire structure of Boolean algebra—everything from De Morgan’s Laws to universal gates in logic circuits—can be generated by systematically applying this single rule. The “why” behind this minimality is deeply tied to the nature of computation itself (and hints at computational irreducibility, as we’ll discuss later).

Automated theorem proving has been around since the 1950s, but it’s usually used to confirm expected results. Rarely—essentially only this once—has it turned up a truly surprising or “canonical” new theorem. The next question, of course, is: “If it’s proven, can we understand how it’s proven?” Twenty-five years in, the answer is still mostly: No, not at a step-by-step, human-intuitive level.

A Challenge: Understanding the Proof

Challenge: Understand the proof of the theorem.

What would that mean? We’re used to reading standard textbook proofs—like the Pythagorean Theorem or De Morgan’s Laws—where each step can be cast into plain mathematical language and we eventually reach an “Aha!” moment. But for this axiom, the proof as found by automated theorem provers is incredibly long, with hundreds of lemmas, each as dense and “unmotivated” as the next. They jump around in the “space of all expressions” in ways that feel more like brute-force search than a carefully curated chain of logic.

Could the proof be intrinsically incomprehensible? Or is there, somewhere, a simpler narrative waiting to be discovered? It’s reminiscent of research in the natural sciences—where we might see a messy dataset and keep hoping for a “clean law” behind it. But maybe, in the end, it really is just irreducibly complex. And that has implications for what mathematics might look like in the future—especially as we rely more and more on computational assistance.

The Machine-Code Proof We Have

Open Wolfram Language (or any strong automated theorem prover), type a single command, and you’ll get a multi-page printout of a proof with 300+ steps. Each step is a lemma of the form

$expression),\text{(some expression)} \;=\; \text{(some other expression)},$ derived purely by symbolic substitutions and transformations. You can track how each new lemma combines older ones—like a big directed acyclic graph of “entailment.” If you try to read it like a math paper, it’s nearly impossible: there’s no “line of argument.” It’s more like watching the compiler output for machine code. The final line is your theorem— $\cdot b) \cdot (b \cdot a)$ —i.e., that $⋅\cdot$ commutes, or eventually that the Sheffer axioms for Boolean algebra can be derived.

If you look at the “proof graph,” certain lemmas appear as pivotal, used again and again. A typical example is “ $\cdot b) = (b \cdot a)$ ” itself (commutativity). Others are short but mysterious: they show up often, but they have no well-known interpretation in standard mathematics, or at least none that large language models (or other semantic search) can currently detect. They’re “metamathematical waypoints” that happen to be needed—but they don’t directly connect to standard math concepts like group theory or ring theory.

In typical mathematics, we name short lemmas or “obvious” bridging facts—like “distributivity” or “commutativity”—and build from them. But in an automated proof like this, hundreds of bizarre intermediate lemmas appear. Maybe we could give them all new names—but it wouldn’t help us see structure unless the lemmas occur in many different proofs or contexts. For now, they’re just ephemeral raw material on the path the theorem prover happened to choose.

Substitution, Bisubstitution, and Metamathematical Space

At the core of Wolfram Language’s FindEquationalProof (and many other theorem provers) is the idea of successively rewriting expressions. When we know $X = Y$ , we can replace $X$ by $Y$ in some subpart of another expression. This is called substitution. But there’s a twist: because in logic our expressions carry pattern variables, we can sometimes match them in two places at once, effectively rewriting part of the lemma and part of the rule. This is “bisubstitution” (or a “critical pair” step). It’s a powerful operation—but it can also produce major complexity. Large expressions get replaced by even larger ones, or vice versa, in ways that are hard to track at a human level.

Open the proof, and you’ll find a near-even split between steps derived by substitution versus bisubstitution. Graphically, the “substitution events” and “bisubstitution events” appear as distinct colors in the proof graph. Substitution often leads to smaller expressions; bisubstitution often to bigger ones. Early in the proof, the system “expands out” into complicated expressions, only later to “collapse back” to something simpler.

This phenomenon—“expanding outward before contracting”—is reminiscent of how certain computations in physics or other complex systems proceed: you have to explore a wide part of the space before “threading back” to a final result. You see a big “spike” in expression sizes in the middle, sometimes tens of thousands of symbols, before it whittles back down to something like $\cdot q = q \cdot p$ .

Unrolling the Proof: 80,000+ Steps (If You Really Want Them)

You might say, “But if there’s an axiom and a theorem, there must be a direct path from the expression $\cdot b)$ to $\cdot a)$ if we keep applying the axiom.” Indeed. But that path is tens of thousands of steps long—something like 80,000 applications of the original axiom if we fully expand every intermediate sublemma. Of course, reusing lemmas shortens it to “only” a hundred or so steps. And so we get that labyrinthine proof graph.

Does that unrolled chain of transformations “explain” the proof any better? Not really. It’s just an even longer list of transformations. You see an initial expression get rewritten in one chunk, then another chunk, and so on. The only advantage is that the entire proof uses the single original axiom (plus pattern matching) at each step, so conceptually it’s “pure.” But it’s not enlightening.

Is There a Better Notation?

Could we rewrite everything in a simpler syntax—some higher-level “macro language” for logic—so the proof would compress? In principle, if you add new definitions (like “ $∨\lor$ ” for OR, or “ $¬\lnot$ ” for NOT), you can shorten the proof. But the question is: does that truly offer insight, or does it just re-bundle subpieces under new names?

A telling experiment: If we allow ourselves to treat certain large “frequently visited” subexpressions as named objects, we do indeed shrink the proof. But it still meanders across a big chunk of metamathematical territory. It rarely passes through any standard, “named” formulas from classic logic references. The subexpressions look random, inhabiting a corner of symbolic space that humans haven’t systematically mapped.

It’s much like machine language (the “NAND gates” level) versus higher-level code. We want a “higher-level language”—the typical “theorem-prover tactic approach”—but that’s always a partial overlay on top of NAND-based logic. Sometimes it helps. Often, it just reveals that the underlying chain of raw rewrites is truly complicated.

Popular Lemmas in the Wider Universe

We might wonder if the weird lemmas in this particular proof also show up in other theorems from the same axiom, or in theorems from other axiom systems. Maybe they’re “universal connectors.” Indeed, enumerating small theorems about NAND, one finds some lemmas appear repeatedly—and might be worth naming. However, the top “common lemmas” are often large, not small. They do “heavy-lifting” to get you from the axiom to more mainstream laws, but they have no obvious place in standard math references.

When we let the system prove all small theorems about NAND from our single axiom, we see thousands of proofs. Each uses hundreds or thousands of intermediate steps—but they share sub-lemmas. A core “backbone” of 40–50 lemmas appears every time. That means they’re presumably crucial building blocks. Yet to our human eyes they look like complicated lumps that don’t match anything we recognize. So, even if we introduced them as “named steps,” we wouldn’t gain much conceptual clarity.

Is the Proof Minimally Long?

What if there’s a dramatically shorter or simpler proof out there, just waiting to be found? Could we have missed a 20-step “human-friendly” argument because our theorem-prover only found a 300-step path? In principle, you can check for minimality by building the entire “entailment cone” and seeing when your target expression first appears. But that’s computationally huge. In practice, the best state-of-the-art theorem provers (including those behind Wolfram Language’s FindEquationalProof) already do sophisticated searches with many heuristics. They typically get near-minimal proofs. Testing smaller subtheorems shows that if you compare the shortest possible paths with the ones found, they often differ only slightly. So there’s no reason to think there’s a short, neat, “textbook” chain we missed.

How We Interpret $⋅\cdot$ : Models

You might ask: “What is $⋅\cdot$ , exactly?” The “size-2” interpretations of $⋅\cdot$ (i.e., truth tables) include NAND and NOR. So any expression that you derive from the axiom is something that’s true in both NAND and NOR logic. Indeed, a step in the proof might reduce the set of valid models from many possible truth tables to fewer ones—but ultimately the only size-2 models that satisfy the entire theory end up being precisely NAND or NOR. This is the “model theory” picture. But, once again, it doesn’t make the proof simpler. If anything, it gives you complicated combinatorial sets of partial truth assignments that keep narrowing. That’s no clearer than the substitution graph itself.

Where’s the Higher-Level Abstraction?

One might hope for a big conceptual leap—like “group theory” does for certain algebraic identities. But recall: we’re not just proving a statement inside Boolean algebra. We’re proving that a certain single-axiom system generates Boolean algebra. That’s more meta. Finding an “abstract reason” for that equivalence might require building a big new theory—maybe something akin to a “higher category theory for logic axioms,” or an exotic “meta-algebra.” As of now, no one’s come close to a unifying abstraction that “explains” this axiom in simpler words. That’s probably in part because “weird,” non-human-friendly rewriting is the heart of the proof. Perhaps the path is fundamentally “computationally irreducible.” Or maybe some brilliant new approach (like how imaginary numbers simplify cubic solutions) will one day exist. But we certainly haven’t found it.

LLMs to the Rescue?

In these last few years, large language models (LLMs) have seemingly swept the world. They’ve gotten uncannily good at producing “human-sounding” answers to many questions. But how do they fare with obscure, large proofs in formal logic? So far, not well.

Ask a typical LLM to produce or explain the entire proof from scratch, and you’ll probably see it confidently assert nonsense—maybe it assumes the operation is associative and runs with that. If you push it, it sometimes confesses it’s stuck, or it might just invent references or steps. The problem is that the proof is so unusual and intricate that it doesn’t show up in the usual human-generated corpora on which LLMs train.

Feeding the correct, automatically generated proof into the LLM yields something more factual. It’ll break out short commentaries: “Then we apply substitution, etc.” But as soon as you press it for deeper insight, it falls silent or repeats the statements. The complexity is simply too large, and no conventional mathematician ever wrote a neat, folksy story about it—so the LLM can’t pattern-match one.

In principle, a neural net or advanced AI system might discover some concept structure that humans would find clarifying. But it would require a leap akin to: “Here’s a piece of newly discovered math; come up with new, universal abstractions for it.” That’s a big ask, and arguably it might well be as difficult as the original problem. If the machine did find such a compression, it might be a new gem of mathematics in itself.

“But Why Is the Theorem True?”

Humans crave reasons, especially in math. “Surely there must be a conceptual reason if something is correct?” But the phenomenon of computational irreducibility says not all true statements have “short justifications.” Sometimes, the only way to see the truth is by doing all the steps.

Consider small cellular automata that grow and then die in 100 steps. Why 100? The only real reason is: they do. You watch the rule run, and it happens. There’s no short conceptual reason. The same might be going on with our simplest axiom. The statement that “it implies Boolean algebra” might just be a computational inevitability of how rewriting proceeds, but with no tidy story behind it. You run through thousands of micro-steps, and at the end you’re forced to say, “Yes, it’s true.” That’s it.

Intuition from Seeing Many Proofs

Wolfram’s discovery of the single axiom came after years of exploring how simple rules generate complex behavior (the theme of A New Kind of Science). He guessed that a short axiom for Boolean algebra might exist because, again and again, we see that simple programs can produce complicated outcomes. The “aha” was that it was small enough to be found by an exhaustive search.

But the structure of the proof is another story. By generating and visualizing many automated proofs in equational logic, one might start to see patterns: heavy use of bisubstitution early on, a “size blow-up” around the midpoint, “pivotal lemmas” bridging large swaths of search space. If you look at tens of thousands of examples, you begin to get a “ruliological” sense of how automated derivations typically work: they fling themselves into some labyrinth of expressions, then tunnel back to the theorem. It might not yield a “short explanation,” but you at least develop some meta-intuition of the process.

A Glimpse of Future Mathematics

Are we headed for a future full of “black-box theorems”? Already, major theorems often rely on vast computations or software (like the proofs of the Four Color Theorem or the “enormous classification proof” of the Feit–Thompson theorem). But in most cases, human mathematicians can still track the “key high-level ideas.”

In the domain of exploratory mathematics or ruliology, though, we increasingly see the “machine code” phenomenon. A statement might be proved by an automatic system, and we’ll never produce a human-readable argument. Will that hamper mathematical progress? Possibly not in applications. If the statement is “We can solve a certain problem in geometry using lemma L,” we only need the result; we might not need a narrative. But conceptually, it’s a shift in how math is done.

Historically, mathematics has been a “white box” discipline, building conceptual frameworks and symbolic manipulations in tandem. But experimental or computational mathematics is more like an empirical science. We discover truths by “experimentation in logic.” We confirm correctness, but we might not have a neat justification. In short: mathematics merges with natural science. The “axioms” become akin to “laws of nature,” and the complicated emergent consequences are like phenomena we can only label or measure.

In Wolfram’s vision, that’s consistent with the broader idea of the ruliad: beneath all mathematics is a vast, exponentially branching “rule universe.” Humans follow certain “coherent slices” of it—like how we see the world macroscopically in physics. We rarely “drill down” to the microscopic level of raw axiom rewrites. But if we do, we see machine-code complexity. There might be no quick fix: no better notation, no genius lemma that leaps the gap. It may well be irreducible, meaning we’ll never recast this proof into a standard “page or two of easy math.”

Epilogue

So, who can understand this proof? Possibly no human ever fully will, at least in the sense of reproducing it with intuitive leaps. It may be our first famous “alien artifact” from the land of raw formal reasoning. Certainly we know it’s correct. We can run each step on a computer and confirm it. But it doesn’t let us peer into a simple overarching reason. For better or worse, that might be the path mathematics increasingly takes as we automate more of it.

Yet the deeper lesson is also quite uplifting: anything that emerges from a sufficiently rich system might require big intellectual leaps to digest, or no such leaps might exist at all. This is a humbling recognition of computational irreducibility. In the end, finding the single-simplest axiom for Boolean algebra does give us insight: that logic, and maybe the universe, can spring from shockingly small seeds. But the proof that it works might forever be a secret locked in the labyrinth of formal complexity—an enigma at the frontier of human and machine mathematics.

Select Your Phone Brand: Select Your Phone Model:

Need Support

Who Can Understand the Proof?

A Window on Formalized Mathematics

The Simplest Axiom for Logic

The Axiom

Why the Theorem Matters

A Challenge: Understanding the Proof

The Machine-Code Proof We Have

Substitution, Bisubstitution, and Metamathematical Space

Unrolling the Proof: 80,000+ Steps (If You Really Want Them)

Is There a Better Notation?

Popular Lemmas in the Wider Universe

Is the Proof Minimally Long?

How We Interpret $⋅\cdot$ : Models

Where’s the Higher-Level Abstraction?

LLMs to the Rescue?

“But Why Is the Theorem True?”

Intuition from Seeing Many Proofs

A Glimpse of Future Mathematics

Epilogue

Further Reading and Acknowledgments

Related

Select Your Phone Brand: Choose... Apple Samsung Google Select Your Phone Model:

Need Support

A Window on Formalized Mathematics

The Simplest Axiom for Logic

The Axiom

Why the Theorem Matters

A Challenge: Understanding the Proof

The Machine-Code Proof We Have

Substitution, Bisubstitution, and Metamathematical Space

Unrolling the Proof: 80,000+ Steps (If You Really Want Them)

Is There a Better Notation?

Popular Lemmas in the Wider Universe

Is the Proof Minimally Long?

How We Interpret ⋅\cdot⋅: Models

Where’s the Higher-Level Abstraction?

LLMs to the Rescue?

“But Why Is the Theorem True?”

Intuition from Seeing Many Proofs

A Glimpse of Future Mathematics

Epilogue

Further Reading and Acknowledgments

Share this:

Related

Select Your Phone Brand: Select Your Phone Model:

How We Interpret $⋅\cdot$ : Models