Unit Tests

Dec 29, 2025

I.

Once upon a time, there was a man named Albert, who said that the laws of physics—

(that everybody knew and agreed upon and had been using for hundreds of years to accurately predict the movement of everything from planets to cannonballs)

—were wrong, actually.

To be clear: there were lots of other people who also said that the laws of physics were wrong. Almost all of those people were crazy.

But Albert (it turned out) wasn’t crazy. Albert’s own answer was incomplete and imperfect, but it was less wrong than the established laws that had been written down by Newton, and it was also less wrong than the answers of the thousands of crackpots who had been correctly dismissed as crackpots.

In order to be sane, in order to be effective—

(In order to be right.)

—you need to be able to sort the wheat from the chaff. You need to be able to figure out how to tell the stuff that sounds crazy

(but isn’t)

apart from the stuff that is crazy

(including when it sounds very normal and familiar).

And whatever method you choose, you want that method to work on Albert. You don’t know what future Alberts will look like, when you come across them, but you do know what the actual historical Albert looked like, at the time. The example of Albert is just sitting there, as a way for you to ensure that your method is working properly.

I call this sort of thing a “unit test.” In computer programming, a unit test is a snippet that makes sure that a given unit of code is functioning properly, and doing what it’s supposed to do. If you have code that is meant to add two numbers together, you might check that code with some unit tests like:

1 + 1 (should give you back 2; if it doesn’t, your code has a problem)
1 + 0 (should give you back 1)
1 + -1 (should give you back 0)
1 + 1000000000000000 (should give you back 1000000000000001)
1000000000000000 + 1000000000000000 (should give you back 2000000000000000)
1 + π (should probably give you back something in the vicinity of 4.14159)
1 + q (should probably give you back some sort of error)

…and so on.

In other words, your unit test checks a bunch of questions whose answers you already know, to make sure that your answer-finding algorithm at least gets those right.

Unit tests don’t guarantee that your method will get everything right, but they’re a start. If you have some general method that is passing all of your unit tests, you’re probably at least in the vicinity of the real answer. There aren’t many general processes that spit out the right answer to all of the addition problems above, that aren’t just … addition.

II.

It seems to me that most people don’t go around “collecting unit tests,” but I certainly do. Whenever I notice a place where a bunch of people got it wrong (but some small number of people got it right), I write it down in my brain:

Dear Future Duncan, make sure that whenever you’re talking to someone who sounds like a crackpot, you decide whether-or-not-to-blow-them-off using a method that wouldn’t cause you to accidentally blow off Albert Einstein, because that would be embarrassing.

(I mean, it was a smidge more forgivable back in 1905, but we just … have the historical example. It would be like failing an open-book test.)

There are a couple of Big Obvious ones, that everybody talks about. For instance: slavery. Back in the 1600s, there were quite a few Americans who thought slavery was great, actually, and even most of those who disagreed did so in a fairly quiet, shrugging sort of way.

If you want to be the sort of person who would get the question of slavery right in the year 1600, then you obviously can’t rely on consensus, because if you happen to be born in a Southern plantation and get raised by a bunch of people who are really confident that black folk are inferior and servitude is their rightful place, then you’ll grow up feeling the same way.

Similarly, you can’t rely on the law to tell you what’s right and what’s wrong, because slavery was legal. (The law is a little bit different from consensus, but mostly like how the crystallized parts of honey are different from the smoothly flowing parts.)

You can start with the question of slavery, and ask yourself okay, how could I get this one right? If I were raised by slaveowners in the 1600s, what sorts of mental motions would allow me to notice that slavery was a moral horror anyway, despite all of the ready-made arguments and justifications floating around me?

It’s sort of like sitting down to code addition, and starting with “well, gee, how do I make this computer take in ‘1’ and ‘1’ and spit back ‘2’?”

But of course, once you’ve built your method around the example of slavery, you need some other unit test to check.

Another big, popular one is “Nazis.” i.e., “would the algorithm that I’m running cause 1930s-German-me to recognize that Jews are fundamentally human, and that my government probably shouldn’t take charge of the entire world (through force if necessary)?”

i.e. would a version of you who was born in early-1900s Germany and raised by early-1900s Germans and had access to only the information that early-1900s Germans had access to, but who was thinking about this question the way that you’re thinking about questions of truth and morality now, actually get it right? Are you asking yourself the kinds of questions, and holding yourself to the kinds of standards, that would cause you to pass the test that so many actual 1930s Germans failed?

Note: it might be the case that 1930s-German-you is constrained in their ability to act. It might be the case that 1930s-German-you can’t actually save any Jews, or interfere with the Nazi war machine. It would be great if they could! But this isn’t about action as much as it is about the mental landscape. An algorithm that recognizes that:
[Jews shouldn’t be genocided and France shouldn’t be invaded]
…but does nothing about it, is still superior to an algorithm that concludes:
[Jews are subhuman and we should invade and conquer France]
…because there might be some situations in which the former algorithm stumbles upon some cheap opportunities to do good, around the edges. The former algorithm is at least primed for goodness, as opposed to being shaped so as to maintain and encourage evil.

III.

There are also a lot of smaller and less-obvious ones, though, and the rest of this essay is basically just a list of a few that I think are important. I think that, if you want to consider yourself a smart and moral person, you should make sure that whatever process you’re using to answer questions and make decisions and form opinions spits out the correct answer on each of these.

(Noting, again, that processes like “just believe what my parents and the church tell me” generally don’t pass this sort of test.)

So, the ones we already have:

Nazis (your process should conclude that the Nazis were wrong)
Slavery (your process should conclude that slavery is bad)
Einstein (your process should not dismiss Einstein the way that many of Einstein’s contemporaries did, prompting him to famously quip something along the lines of “Why so many? If I were wrong, it would have only taken one.”)

Others, in no particular order:

The Wright Brothers & Self-Driving Cars
There were newspapers declaring that heavier-than-air flight would never be achieved, or would take thousands or millions of years to develop, after the Wright brothers had already made their first flight at Kitty Hawk.

When you’re attempting to predict technological developments: are you doing so via a process that would’ve caused you to dismiss the Wright brothers out of hand? On the flip side, would your process (if applied in the year 2010) cause you to erroneously predict that self-driving cars would be everywhere by 2015?

(And then, in 2015, would it cause you to fall back to 2020?)

(And then, in 2020, would it cause you to fall back to 2025?)

Hands Drawn by Generative AI
For a while, generative AI was doing pretty poorly at things like hands and teeth.

Some people responded to this by sneering, and by clinging to the AI’s inability as a sort of security blanket. “See?” they said. “It can’t even get the hands right, it’ll never amount to anything.”

That period lasted for less than three years, and there’s a lesson there.

9/11
If your processes, applied to post-9/11 America, result in invasions of Iraq and Afghanistan and the assembly of a vast surveillance apparatus and hundreds of millions of people engaging in airport security theater that burns countless time and money, you have made an avoidable mistake.

The Capgras Delusion
The Capgras delusion is a rare mental illness that causes you to (wrongly) believe that one or more of the people you know and love have been replaced by perfect doppelgangers.

You almost certainly can’t “beat” the Capgras delusion in the sense that your injured mind would give up on its belief that this had happened. But you can know about the Capgras delusion, and try to structure your behavior such that you can make a distinction between “my brain is telling me this” and “therefore I think it’s actually true.”

If your way of doing things spits out “and then I abandoned my spouse, because they were replaced by a doppelganger,” then you have failed the unit test.

Psychedelic revelation
Similarly, there are millions upon millions of people who take The Drug That Makes You Believe God Exists Or Whatever, and then have the direct visceral experience of seeing that god exists (or whatever), and then come out of the experience concluding that god exists, actually.

From the outside, it’s easy to look at people taking psychedelics—

(or suffering from sleep deprivation, or going through bouts of mania or psychosis, or taking part in intense experiences like religious retreats or shamanic ceremonies)

—and recognize the mistake that they’re making. Recognize that they’re updating on din, that they went from not-believing X to believing X for inadequate reasons. That they are failing to distinguish between the outputs of the Feel Like X Is True process, and the question of whether or not there’s real actual reason to think X is true, though.

But would you pass this test? If you put yourself through the Feel Like X Is True process, and came out (predictably) feeling like X is true…would you be able to notice what had happened, and correct for it?

Scientology and the Council of Nicaea
Similarly, sometimes people will say “hey, I’m about to make a bunch of stuff up and lie to you, okay?” and then they will make a bunch of stuff up and tell a bunch of lies,

and then, inexplicably,

people will believe them.

Unit test: does your way of going about [assessing evidence] and [forming beliefs] allow you to notice that many religious claims are made up, they literally documented themselves making it up, come on, this one isn’t hard, guys,

Close-minded atheism
It’s maybe a little bit punching down to pick on religion—

(although seriously, your mental processes should cause you to conclude that the religion your parents practice is almost certainly wrong in almost all of its claims about the nature of the universe)

—but there are ways of doing not-religion that are bad, too. Religions are highly evolved memeplexes—they’re still around because they keep proving useful. There is a unit test that is something like “don’t be a stupid atheist,” i.e. don’t let your rejection of the false metaphysical claims of any given religion blind you into thinking that that religion is devoid of practical wisdom about how to live a good life, or what sorts of things allow people to live in harmony with one another and with the world around them.

Unit test: does the specific process you’re using in order to prevent yourself from falling for the memeplex cause you to throw the baby out with the bathwater, and be unable to mine that memeplex for good and true and useful things?

The Failures of Your Parents
If your process for metabolizing and expressing your emotions ends in you blowing up in the line at Disney World and ruining the family reunion, that process can probably be improved.

Shoes
Be careful not to fall into the trap that looks like:

Wearing shoes that atrophy your feet
Developing foot problems as a result
Solving the foot problems with extra orthotics and support
Discovering that your feet have weakened even further
Adding yet more shoe to compensate

If your way of assessing and addressing problems would cause you to try to fix [Too Much Shoe] with [Even More Shoe], then it is probably doing even worse stuff elsewhere.

Climate Change, Nuclear War, Y2K, the Ozone Layer, Malthusian Famines
The trick here is distinguishing between:

Predicted disasters that were never actually threats at all
Predicted disasters that were serious threats, that were averted because they were taken seriously
Predicted disasters that were serious threats, that were averted because we got lucky
Predicted disasters that continue to loom despite partial progress in averting them

…etc. When you’re setting out to assess a predicted disaster such as human extinction from artificial superintelligence, you want your process, applied to past history, to spit out responses like:

The ozone layer and Y2K problems were real, actually, and were averted thanks to the concerted efforts of thousands.
The nuclear war problem was real, and humanity got through it via taking it seriously and getting lucky; there are nearby universes that are very very similar to our own where the world blew up in the 70’s or 80’s.
Climate change is simultaneously a real and oncoming threat, and also not necessarily as swift or apocalyptic as some people want to assure you it definitely is, and also it could in fact be that apocalyptic, there are some deeply concerning trends that we do not fully understand and they could in fact run away from us with surprising speed.

Peanut allergies
Whatever process you’re using to make decisions, as a culture, should probably both let most kids have Reese’s cups and also not kill kids with peanut allergies.

Left-handedness
Back when we used to beat left-handed children, the rates of left-handedness hovered around 2-3%. Once we stopped, they rose to around 10-12%.

If you’re using some process to predict and understand things that are taboo or heavily disincentivized, you should check to see whether that process would correctly spit out conclusions like:

A lot more than 3% of people are left-handed, actually
Don’t panic about the upswing; the rate of apparent left-handedness is not actually going to go up to 100%; it will level off
Nothing awful is going to happen now that we’ve stopped beating left-handedness out of people

COVID-19
Were you “right” about COVID-19? Did you see it coming? Did you take sensible actions in response?

If not, could you have? Could you tweak your threat assessments and decisionmaking processes into a shape that would cause you to be ahead of the curve, on COVID-19?

(Ditto Bitcoin, and also ditto all the Bitcoin-like opportunities that would have lost you money.)

I’m ending here, somewhat arbitrarily. For any given life domain, I’ve got dozens and dozens of unit tests stored somewhere in my brain, and it’s probably not worth it to try to make an exhaustive list.

(Although I’ve enabled comments for unpaid subscribers on this one, and I do genuinely hope you’ll leave one or two of your own unit tests as a comment below. Please? Pretty please?)

But hopefully, the examples above help illustrate the key point: when you’ve got a Way Of Doing Things, there’s actually a wealth of historical data out there, to let you see some of the ways it might be broken. You don’t have to take the time to go check and see whether you’re doing something that would cause you to be on the wrong side of history, but, uh.

You could?

…I think you should.

I mean, letting your kid die of measles in 2026 would just be embarrassing, no?

Sarah Nibs

Dec 29

I thoroughly believed in a fundamentalist Christianity, as far as _any_ outside observer could tell. Now I'm me. My assessments of people and their brains should not make the mistake of believing that lunatic religionists like past-me are necessarily lunatics in any fundamental sense.

Max Harms

Dec 30

In software there's a notion of test-driven development, where the way to add new features, fix bugs, or otherwise change code is to first find a unit test that the codebase currently fails and only then do the update. The failing test then serves as a kind of driver for progress. (I have mixed feelings about TDD, but here it's a metaphor.)

It's easy to list unit tests you pass. I'm curious to hear unit tests you are currently failing (but which are still valid).

I'll start: my epistemics predict way more theft, other minor crimes, and mayhem than we see in practice. Like, it's not that hard, I claim, to learn to pick locks and figure out who is home at 3am. There are a variety of updates I could make to explain it, but they'd Explain Too Much, and cause other unit tests around lying, cooperation, and incentive gradients to fail. Like, I currently pass the "almost all of the freight train cars older than 5 years have graffiti" unit test!

37 more comments...

Homo Sabiens

Discussion about this post

Ready for more?