The Devil in Scale Featured
Read the news for ten minutes and you will feel it. That small flash of contempt for whoever signed off on the obvious mistake. A bridge collapses. A policy meant to feed a region starves it. An invasion promised to take six weeks limps into its third year. The implication of our headshake, never quite spoken aloud, is that we ourselves, given the same chair, would have chosen otherwise. I want to take that feeling seriously and then take it apart. The old proverb tells us the devil is in the details. As far as it goes, this is true. The careful workman attends to texture, to grain, to the small wrong note in the third measure that ruins the symphony. The devil-in-the-details proverb is what makes a draftsman good. The proverb only covers half of where the devil actually lives, though. The other half, the larger half, the one that ruins more lives and breaks more empires, is scale. Same idea. Different size. Different beast entirely.
I.
Pour yourself a glass of ice water. Lift the cubes out with your fingers and drop them into a second glass; you keep almost all of them, almost effortlessly. Now try to lift the water itself. Your fingers come up wet. A few drops cling, most of the body stays in the glass. Boil the water on the stove now and let it rise as steam. Try to lift that. With a cloth, you trap a few wisps. With a bag, you bag a little. The molecules are identical. The handling problem is not. Those three states are the whole picture. The same substance, at different states, demands different methods, different tools, different theories of intervention. When you try the ice technique on the steam, you will fail spectacularly, and you will look stupid doing it. The physics of size says this with numbers. Geometry alone forces a regime change when things get larger. The mass of an animal grows as the cube of its length, while the cross-section of its bones, which has to bear that weight, grows only as the square. Double the dimensions of a mouse and you have not made a bigger mouse; you have made an animal whose legs cannot quite hold it up. An elephant is not a mouse rescaled. The thicker legs, the column-like posture, the heat dissipation through enormous ears, these are not vanity. They are necessities forced on a body by the geometric arithmetic of being elephant-sized.
Metabolism plays the same trick. Across living things, from shrews to whales, an organism’s energy use does not scale in proportion with its mass; it scales with mass to the three-quarter power. A creature ten thousand times heavier than a mouse burns only a thousand times more calories. A bigger animal is not just a more expensive small one. It is a different kind of engine, operating at a different rhythm, with a different relationship to the world around it. If you scale a city, the same nonlinearity reappears in startling places. Doubling a city’s population does not double its output. It produces, on average, about fifteen percent more of nearly everything per capita: more patents per person, more wages per person, more new businesses per person. The catch, the catch that should give every urban planner a long pause, is that the fifteen percent applies to the bad outputs too. More violent crime per capita. More infectious disease per capita. More loneliness in the crowd. The same forces that make big cities engines of invention make them engines of pathology. You cannot have one without the other; they are mathematically the same process viewed from two angles. The infrastructure scales the other way, meanwhile. Doubling a city does not require doubling its gas stations or its roadway, only adding about eighty-five percent more. There is a real economy of physical scale. There is also a real anti-economy of social scale. The two run in opposite directions, at the same time, in the same place.
Physicists have a clean word for moments when size or temperature flips a system from one regime to another. Phase transition. Cool water past zero degrees and the molecules abruptly arrange themselves into a lattice; they were liquid one moment, solid the next, with no smooth interpolation. Nothing about the individual molecules changed. The collective behavior did. Social systems do this too, though we rarely give the phenomenon its proper name. Imagine a crowd of a hundred people considering whether to riot. Suppose each person has a private threshold: this one will throw a brick once two others have, this one needs ten, this one needs fifty, this one will never join no matter what. If the thresholds are smoothly distributed, the crowd may riot completely. Remove a single person, the one whose threshold was two, and the chain reaction never starts; the crowd quietly disperses. The average disposition is identical. The aggregate behavior is not. The difference between a march and a massacre, between a movement and a fizzle, can rest on a single threshold value in the tail of a distribution that no surveyor would ever catch. This is what scale really means. Not “the same thing, but more of it.” A new system, with new rules, new failure modes, new ways of surprising you.
II.
Take an economist who has spent her career designing rigorous evaluations of policy interventions. She built a kindergarten program in a single school: small classes, an aide trained in a specific technique, a curriculum braiding literacy and emotional regulation. The numbers were beautiful. Test scores up, behavioral incidents down, parents reporting that their children seemed calmer at the dinner table. Replication in a second school produced the same beautiful numbers. The press wrote it up. A foundation funded the rollout. The intervention crossed the river and went city-wide. It dropped most of its voltage on the way. The aides in the pilot had been recruited from a master’s program with a strict admission filter; the city version drew aides from wherever the union sent them, which is to say from people who were good and people who were tired. The pilot parents were the ones who returned the consent forms within forty-eight hours; the citywide version included the parents who could not be reached, the parents who worked two jobs, the parents who were skeptical of any new program because the last new program had not helped their older child. The pilot teachers had volunteered, sometimes effusively; the citywide teachers had been assigned. None of these substitutions seemed obviously catastrophic. Each one shaved a little. Together they cut the effect roughly in half.
This is not a story about a bad intervention. It is the standard arc of nearly every social program that has ever been carefully evaluated. The pilot recruits the enthusiastic, the trained, the prepared, the believers; the rollout takes everyone. The pilot runs in a building where the principal pulled a string for it; the rollout runs in buildings where the principal is fielding a leaking-roof complaint and three custody disputes. The pilot’s costs sit inside a single grant; the rollout’s costs are stretched across an agency with a turnover problem and a procurement system designed in 1973. Call this the voltage drop. Most ideas don’t carry their full charge across the wire from prototype to deployment. The wire has resistance. Some ideas do worse than fade; they invert. A class of intervention exists, popular in the latter decades of the twentieth century, that takes troubled adolescents into a prison for a day, lets the inmates yell at them, lets them feel the heavy door close, and then sends them home shaken. The local theory is compelling. Show them where they are headed. Scare them straight. The phrase itself became the program’s name. When the program was finally evaluated with proper randomization, the youth who attended ended up offending more than the youth who did nothing. The odds of subsequent arrest went up by roughly two-thirds. The intervention had been actively harmful for years, and almost everyone who had been to a session would have sworn to its effectiveness, because the kids did look shaken when they came out.
This points at something deeper than a single bad design. Take all the small psychological studies that ran on small motivated samples and run them again, properly, in front of a wider population: a great many of them shrink to half their original effect size or disappear entirely. The replication arithmetic is brutal. Of a hundred carefully redone classic experiments, only about a third produce the same effect. The remaining two-thirds reveal not fraud but the structural condition of small-scale work. A sample of twenty undergraduates from one university is not the world, and the things that move twenty undergraduates often do not move a country. The most painful version of this lesson is microfinance. The original village-bank model worked. A small group of women, neighbors and kin, lent money to each other, pressured each other to repay, watched each other’s children, knew each other’s husbands. The default rates were astonishing. The story spread. International institutions wrote checks. The model scaled to entire regions and then entire countries. When the rigorous evaluations finally came in, they showed: yes, modest increases in small-business investment. No measurable effect on household consumption. No measurable effect on women’s empowerment. No measurable effect on children’s schooling. Three years out, four years out, the microloan recipients lived lives that looked roughly indistinguishable from the lives of their neighbors who never borrowed. The village magic had been social, not financial. It did not survive being routed through a corporate balance sheet.
III.
Here are two old stories about pests.
A colonial administration in a tropical capital decides that the city has too many rats. The plague has been bad. The streets stink. A bounty is announced: bring in a rat tail, get a small coin. The first week, tails arrive by the basketful. The administrator congratulates himself. The second month, tails are still arriving in volume, but the rats themselves are still everywhere. He sends an inspector. The inspector returns with a peculiar observation. He has begun seeing rats in the alleys with no tails. “They’ve been clipping the tails and letting them go, sir. A rat with no tail still breeds.” The administrator’s calculation had assumed the bounty hunter wanted dead rats. The bounty hunter wanted coins. This is a famous parable. The version with cobras, in a different colonial capital, is slightly less reliably documented but tells the same lesson: a bounty on cobras produced cobra farms, and when the bounty was withdrawn, the cobra farmers released their stock, leaving the city with more cobras than it had started with. The lesson is not “bounties are bad.” The lesson is that the planner who designed the bounty held in his head a model of the citizenry, and the citizenry held in their heads a model the planner could not see. Every individual in the alley knew something the administrator did not. The administrator was not stupid. He could not, by virtue of his position, hold inside one mind the dispersed scraps of knowledge that the city ran on. There was no committee, no central nervous system, no information channel that could have given him the truth: that two streets over, a man was breeding rats in his courtyard.
Central planning is a different kind of problem from local management. A village headman can walk to every doorstep. A national planner cannot. The knowledge that runs an economy is not centralized because no mind is large enough to hold it; it lives in transactions, in tacit habits, in the small calculations people make when nobody is watching. Try to substitute one mind for that distributed memory and the substitution will fail in ways the planner cannot even diagnose, because the missing information is information he was never going to have. A larger version of the same dynamic unfolded in a vast country in the late 1950s, when a government convinced itself that birds were a serious problem. The country had grain. The birds ate the grain. A clean calculation followed: kill the birds, save the grain. The campaign was issued from the capital and organized down through every village. Citizens beat pots and pans without stopping so the birds could not land; bird after bird fell from the sky exhausted, dead from terror or muscle failure. By the end of the campaign, perhaps a billion sparrows had been killed. The harvests that followed were terrible. The locusts, which the sparrows had been eating, did the work the sparrows would have prevented. The fields were stripped of what insects could strip. The bird campaign was not the only thing happening at the same time, though. The country was also being reorganized into collective farms, which scrambled the family-level grain stewardship that had absorbed previous bad harvests. Peasants were being pulled out of the fields during planting and harvest to work backyard steel furnaces, because a separate calculation in the capital had decided the country needed more steel. Local officials, terrified of admitting that production targets had been missed, were sending up falsified harvest reports. The center, operating on those false numbers, was setting procurement quotas and continuing grain exports as if the country had food it did not have. The famine that emerged from this combination was, by any reasonable estimate, among the largest in recorded history; the lower estimates put the death toll near thirty million, the higher near forty-five.
The sparrows were one ingredient in a much larger compound. Each ingredient, taken alone, came from reasoning that made local sense. Killing pests to save grain makes sense. Pooling land to share equipment makes sense. Building steel for industrialization makes sense. Sending up the numbers that pleased one’s superior was, from any individual cadre’s point of view, the prudent thing to do. The disaster did not require any single decision to be obviously wrong. It required the decisions to interact, and it required the interactions to be invisible from any vantage point that could have stopped them. Was the village-level calculation about the sparrows wrong? On the margins of a single rice paddy, no. A sparrow that eats a grain of rice has, in that instant, taken food out of someone’s mouth. The sparrow has also, over the course of its life, eaten a hundred insects that would have eaten ten thousand grains of rice. The village-level intuition was correct about the local event and grotesquely wrong about the ecosystem. The deeper problem was not even the gap between the local and the ecosystem, though. The deeper problem was that the same hierarchy that decided to kill the sparrows had also made it impossible for any peasant or cadre to send up the message: this is not working, the locusts are everywhere, the steel furnaces are eating the harvest, the numbers we are giving you are lies. Scale failure in a hierarchical system is not just about local reasoning being wrong about the whole. It is about information getting worse as it climbs. Anyone reading this with the comfort of distance might think: how could they not have seen? The question is the wrong question, though. The right question is: what made it possible for the cleverest people in the country to agree, simultaneously, on a set of policies that turned out to combine into ruin? The answer is not stupidity. The answer is that each policy had its own internal logic, the interactions between them were invisible to any single decision-maker, the information flowing up the hierarchy had been corrupted by the incentives the hierarchy itself had created, and the ideological commitments at the top left no room for the bad news that might have arrived anyway.
IV.
There is a thought experiment that gets passed around at certain dinner tables and certain campaign stops. It goes like this: take the wealth of the world’s billionaires, around sixteen trillion dollars in total, and compare it to the cost of ending world hunger, which a respected international body has placed at about ninety-three billion dollars per year. Sixteen trillion against ninety-three billion. The hunger figure is well under one percent of the billionaire figure. So why don’t we just do it? I have a great deal of sympathy for this question. I have less for the answer it usually receives. Saying “billionaires are powerful and selfish” doesn’t solve the puzzle. The billionaires might in fact be powerful and selfish; the arithmetic is harder than it looks for reasons that have nothing to do with their character. The sixteen trillion does not exist in the form the calculation assumes. A billionaire’s wealth is not a vault of coins. It is, almost always, a stake in an enterprise: a percentage of a company, valued by what the most recent buyer paid for the most recent share. The valuation works because the number of shares being traded at any moment is small relative to the total number of shares outstanding. If a billionaire tries to sell ten percent of her holdings on a Tuesday, the price drops. If she tries to sell all of them, the price collapses to whatever a buyer is willing to pay for a controlling stake in a company whose founder has just publicly liquidated, which is to say very little. The sixteen trillion is a paper number. The actual amount of cash that could be raised by liquidating those holdings, at speed, in a forced sale, with the markets watching, is a fraction of the headline. We do not know exactly what fraction because no one has ever tried; the experiment would destroy the answer it was meant to produce.
Even granting the money, even imagining that some fairy converted the sixteen trillion into a real bank balance, the ninety-three billion that would solve hunger has not been solved by ninety-three billion in any prior year. The actual budget shortfall of the world’s main food-distribution body is in the low billions and goes unfunded year after year. The international agencies that would do the work describe their constraint not as money but as access. Food cannot be flown into a country in active civil war without an escort, and the escort needs the consent of warlords who do not want it there. Trucks get hijacked at checkpoints. Warehouses get bombed. Aid workers get kidnapped. The roads in the regions where hunger is worst are unpaved in the rainy season and impassable for half the year. Hunger, at scale, is not a money problem with a money solution. It is partly a money problem, mostly a logistics problem, and increasingly a governance problem. Treat it as one of these and you have misdiagnosed; treat it as a money problem and you will fund a program that cannot spend the money because the trucks cannot drive on the roads. The thought experiment is not wrong because billionaires deserve their wealth. It is wrong because it confuses two phases of the substance. The cash you can hold in your hand is the ice cube. The cash that would have to flow into a war zone, around a roadblock, past a customs official who wants a bribe, into a refrigerated truck whose battery has died, is the vapor. The arithmetic that works for the cube cannot do the work the vapor requires.
V.
Anyone who has built software knows the version of this that lives in code. You write a program on your laptop. It works on your laptop. You deploy it to a single server. It works on that server. You add a second server, for redundancy, and the first crack appears: when a user writes data to server A and immediately reads from server B, the second server has not yet heard about the write. The two servers disagree about what the user just did. Now you have to choose. You can make them agree, by waiting until both have confirmed every write, in which case the system becomes slow whenever the network is flaky. You can make them available, by letting them serve traffic even when they disagree, in which case the system sometimes shows the user information that contradicts itself. There is a clean mathematical proof, more than two decades old now, that during any moment when the two servers cannot reach each other, you must give up one or the other. No third option exists. The single-laptop version of the program never had to make this choice. The minute the program crosses a threshold, from one computer to two, the choice is forced. From two to two thousand, the choice becomes a permanent feature of the system’s life. The software now has a politics. Different teams take different sides.
The same threshold appears in the social structure of the people writing the software. A team of three friends in a garage can produce tight, integrated work because every change can be discussed across a table in fifteen seconds. A team of fifty cannot. The number of pairwise conversations between fifty people is twelve hundred and twenty-five; the number of pairwise conversations between three is three. When the codebase grows up under fifty pairs of hands, it inherits the communication structure of the fifty, which is to say it splits along the cracks where teams hand off work. The software, looked at by an outsider, looks like the org chart that produced it. Try to redesign the software without redesigning the org chart and you will not succeed; the chart will reassert itself within months. A famously botched launch of a national health insurance website happened a little over a decade ago. It crashed on its opening morning. It enrolled six people on the first day. It worked eventually, after roughly a year and several hundred million dollars of additional spending. The post-mortem is interesting for what it found and did not find. It did not find bad engineers. The engineers were good; many of them had worked on serious systems before. It found that the project had been distributed across dozens of contractors, that no single person had final authority over the architecture, that policy decisions had been deferred until late in the process so the implementation had only a few months for what should have taken eighteen, and that the contract structure rewarded delivering modules on time more than delivering a working system at the end. The website crashed, in other words, for organizational reasons that the people writing the code could see but could not fix. The site was an org chart that nobody had remembered to design.
VI.
So where, in all this, does the old proverb still have a home? Where is the devil legitimately in the details? Within a phase. Once you know which regime you are in, once you have correctly identified that you are dealing with ice or with water or with steam, the details inside that regime matter enormously. A small team that gets the small-team details right will outperform a small team that doesn’t. A village policy crafted with attention to the local conditions will outperform a careless one. The devil-in-the-details proverb is not wrong; it is incomplete. It tells you to attend to the texture of the thing in front of you. It does not tell you that the thing in front of you can change phase, and that when it changes phase your details will become irrelevant overnight. The trap is using detail-thinking across phase boundaries. The founder who built a five-person company by knowing everyone’s spouse’s name continues knowing everyone’s spouse’s name at five hundred people, and the company stops working because attention that was a feature at five becomes a bottleneck at five hundred. The minister who governed a city by walking through it on Saturdays cannot govern a nation by walking through it on Saturdays. The engineer who debugged a single-machine program by reading the logs line by line cannot debug a thousand-machine distributed system by reading the logs line by line, because no human can read a thousand log streams at once.
What looks like a failure of character is usually a failure of category. The same person, applying the same care, fails not because the care has weakened but because the substance under their hands has changed state and the care no longer reaches it. This is also what makes hindsight so cheap. A decision that looks idiotic in retrospect almost always had a justification at the time, and the justification almost always took the form: this worked the last several times it was tried. Reference-class reasoning, the default mode a decision-maker falls back on, is reliable until the phase changes. When the phase changes, the reference class becomes a trap. The leader who orders an invasion and expects a six-week campaign is not necessarily reasoning poorly; she is reasoning from the last six campaigns that looked like this one, and the seventh campaign, the one she is in, has different technology, different politics, different information environment. The reference class she used was empirically correct for everything except the case she was deciding.
This is why hindsight reads as stupidity. From the outside, after the phase change has revealed itself, the new regime looks obvious. From the inside, before the change has happened, the new regime is the thing the actor cannot see, because she is inside the old one and the old one is generating all her information. There is an old fable about a man who finds a fence in the middle of a field and decides to tear it down because he can see no reason for it. The fable instructs him not to tear it down until he understands why it was put there. The fence might be the load-bearing remnant of a fight he wasn’t around for. The instruction is not conservative in the political sense; it is epistemically modest. It says, the existing arrangement contains information you do not have, and your inability to see why something is there is not strong evidence that the reason has gone away. Most of the reforms that look obvious to outsiders run into a fence the outsiders did not see. The reforms are not wrong because reform is bad. They are wrong because the fence was load-bearing and the reformer didn’t know.
VII.
What does this mean for those of us who actually have to act?
First, I think, it means treating evidence from small scales as a starting hypothesis, not a settled conclusion. A pilot that worked is not a policy that will work. The pilot is the ice; the policy will run as water or steam. Before scaling anything, ask hard questions about what made the pilot work. Was the population representative or self-selected? Did the pilot depend on a charismatic individual or a tight team you cannot clone? Are there spillovers, second-order effects, perverse incentives that only emerge at scale? Will the cost structure change when supply chains, hiring, and oversight stretch across a much larger area? If the answer to any of these is “I don’t know,” another pilot is owed before deployment, this one at the next level up. Five people, fifty, five hundred, five thousand. Each transition is its own test.
Second, it means watching for phase boundaries rather than assuming smooth growth. The signature of a phase transition is that adding one more unit changes the kind of system you have, not just the size. Companies famously hit walls at around twelve employees, fifty, a hundred and fifty, five hundred. Each wall is a regime change; the management style that worked on one side fails on the other. Policies hit walls at the village-to-region and region-to-nation transitions. Software hits walls at single-machine, single-rack, single-region, multi-region. The wise mover redesigns at each wall, rather than scaling. Scaling means using the same design at higher volume. Redesigning means accepting that you are now solving a different problem.
Third, it means asking, every time you hear “we just need to do X at scale,” what X assumes about information, incentives, and trust. Most plans assume more shared information than the actors actually have. Most plans assume more aligned incentives than the system actually offers. Most plans assume more trust than the parties actually feel. When you have to choose between an elegant centralized plan and a less elegant arrangement that uses information already distributed across the participants, the latter usually wins. Not because elegance is bad, but because the elegant plan has hidden inside it an assumption that someone in the center knows what people in the field know, and that assumption is almost always false.
Fourth, it means refusing to confuse money problems with the other problems money cannot solve. A check, even a very large check, is a tool with a fixed shape. It cannot drive a truck across a damaged road. It cannot get a customs officer to wave through medicine without a bribe. It cannot make two warring factions sit at a table. There are problems money is good at, and there are problems money is bad at, and treating the second set as if they were the first set produces years of misdirected effort and the disillusionment that follows.
Fifth, and this one is harder, it means giving up the cheap pleasure of contempt. The people who designed the ruinous campaign were not stupid. The people who voted for the failed reform were not stupid. The colleagues who designed the system that crashed on launch day were not stupid. Each of them was doing the best reasoning they could from the position they were in, with the information they had, inside a regime whose properties they could not yet see. The “how could they?” reflex is satisfying, but it is the same fundamental attribution error that lets us blame the driver of a crash without asking about the bridge.
This does not mean every decision was equally defensible. It means that the place to look for fault is not character but updating: not whether the original decision was right, which it almost never could be in advance, but whether the decision-makers revised when the evidence arrived. The campaign that killed the sparrows was a forgivable mistake; the refusal to admit, when the locusts came, that the campaign was counterproductive, was not. Most disasters at scale are like this. The original choice was wrong but understandable. The failure to update was wrong and inexcusable.
VIII.
Two ways this argument can be wrong are worth naming before closing. The first is over-application. Some things really do scale. Vaccinations scale. Literacy scales. A clean glass of water for every household scales. The point is not that nothing scales; the point is that scaling is a different problem from doing the thing once, and the second problem deserves its own thinking. Just because most ambitious schemes hit phase walls does not mean every ambitious scheme will. Some things transfer. The work is in knowing which. The second is the temptation to use the argument as a defense of paralysis. “Don’t try to fix anything, because the fix won’t scale.” This is a misreading. The problem is not ambition itself; the problem is ambition that has not budgeted for the difficulty of its own implementation. The cure for an oversimplified plan is a more realistic plan, not no plan. Anything else is a sophisticated form of giving up. There is also a quieter error worth naming, which is the romanticization of small scale. The village is not a paradise. Local knowledge is not always wise; sometimes it is just local prejudice. The fact that a national plan failed does not prove that a hundred village plans would have succeeded. The village plans might have produced a hundred miniature versions of the same failure, less visible only because no one was counting. Distributed failure is still failure. The lesson is not “small good, big bad.” The lesson is that the same substance behaves differently at different magnitudes, and that we must design for the magnitude we are operating at, not the one we wish we were.
Pour the ice water again. Imagine you were the person who built the technique of moving ice cubes with fingers, and you have spent a lifetime refining it. Your reputation is built on it. Your colleagues admire it. Your students learn it. When the water becomes liquid, when it becomes vapor, will you have the humility to set the tongs down and pick up something else? Will you recognize that the substance has changed state and that your virtuosity, hard-won and real, is no longer the right virtuosity? Will you let go of the idea that the problem is the same problem because the molecules are the same molecules?
This, I think, is what the deeper version of the proverb has to say. The devil is in the details, yes; this is what the careful workman knows. The devil is also in the scale, and this is what the wise workman knows: that there is a kind of attention you can pour into a small thing that will not survive being poured into a large one, and a kind of care that has to be redesigned every time the substance changes phase. The first proverb makes you good at one thing. The second proverb is what lets you stay good as the world rearranges itself around your hands. We judge the people who failed at scale as if they failed at the same task we would have aced. We forget that they were not doing what we would have been doing. They were holding steam and being measured against ice. They were trying to write down distributed knowledge from inside a single room. They were optimizing a village policy across a country whose villages did not all look like theirs. This does not exonerate every failure. Some were laziness, some were greed, some were the kind of cowardice that refused to update when the evidence came in. The reflex to assume idiocy, though, the reflex that visits us every time we read the morning paper, deserves to be retired. The world is too big for our intuitions. It is also too small for our patience. We will keep failing at scale until we learn to suspect the very ease with which the answer arrived. The easier the solution looks at the kitchen table, the longer it is going to take in the field.
Featured song:
Similar Essay: Decoder Ring
Image Source