Budgeting for software projects in “scrum team years.”

It’s often said that if you want to know how long it will take to complete an Agile software project, then you should get started on building it. The theory is that once you have your backlog built out, your stories sized, and you know your velocity thanks to a few months of work, you can get a rough idea of how long it’ll take to complete the whole thing. (Henrik Kniberg explains this in his “Agile Product Ownership in a Nutshell” video.) And that might be fine in some scenarios, but in organizational contexts, it’s often impossible to get started without funding, and it’s impossible to get funding without having a defensible estimate of how long a project will take…which is difficult to do without getting started. What’s to be done?

There are a lot of bad ways to estimate software projects costs, and they fall into two camps: qualitative estimates (“I did something like this once and it took about 10,000 hours, so that’s how long this will take”) and quantitative estimates (“this is similar to these three other projects, which have an average of 600,000 lines of code, and historical data shows that line of code takes 1 minute to write, so this will require 10,000 hours”).

In government, in practice, neither of these are used. Instead, an agency publishes an request for information, vendors provide ballpark figures that are rarely rooted in any defensible math, the agency then makes a request for funding based on those responses (e.g. by tossing out the high and low numbers and averaging the remainders), funding is awarded, the agency publishes a request for proposals, and the bids come back for prices real close to that awarded funding. At any step of the way, if anybody asks why the cost is, say, $20 million…well, that’s not actually explainable. There is no internal logic that underlies this price tag. The result has been 20 years of spiraling of costs for custom software in government, as prices have gradually gone up because they are tethered to nothing but the amount of money that vendors say it’ll cost, and they have every incentive to provide a big number.

There is a better way: “scrum team years.”

Federal labor data shows us that the blended hourly rate for each member of a scrum team averages about $125, or about $235,000 (at 1,880 hours per year). A scrum team, therefore, will run you $1–2 million, depending whether it’s closer to four members or nine.

When procuring a major custom software project, you can interrogate the reasonableness of the price by breaking down the price into scrum team years. Is the bid for $20 million? Then you should be getting between 10–20 scrum team years, perhaps as 5 scrum teams working for 4 years, perhaps as 5 scrum teams working for 2 years, or any number of other mathematically plausible variants. Experienced software developers can compare the complexity of a project to the number of scrum team years and have a sense as to whether the price makes sense.

This works at an agency level, this works at a procurement level, this works at a budgeting level. It allows people who lack deep expertise in software development (which is to say nearly everybody involved in the entire budgeting and procurement process) to have some basic unit of value to compare and debate. (Does this project really require 500 people working for 5 years? What could we get from 10 people working for 6 months? Wait, we’re only getting 10 scrum-team years but we’re paying $50 million? And so on.)

It’s even possible to interrogate the price tag more deeply, if it’s arrived at thoughtfully. In the same way that a scrum team will generally estimate story sizes (“pointing stories”) before pulling them into a sprint, it’s also possible to use experience—some of that qualitative analysis—to make estimates at much larger scales. This will suffer from the same problems that plague standard estimation methods, but for the purpose of establishing a rough order of magnitude, when used by people with significant experience executing comparable projects, it’s an internally-coherent, externally-validatable approach. Reasonable minds can disagree as to whether creating (say) a basic account-creation system might require more or less than one month of work by a scrum team, but breaking down a project into a dozen or so such units and recording the estimated effort level for each unit allows those minds to disagree meaningfully and productively.

Scrum-team years are a good estimation tool for aligning program teams, budgeting, procurement, and oversight, giving everybody a common currency of understanding.

How an agency principal should oversee a major custom software project.

The success of many government agencies now hinges on their ability to successfully execute large custom software projects. And yet as a rule, the principals of those agencies lack the ability to ensure that those projects will succeed, or even to oversee them meaningfully. As a result, they have lost the ability to ensure that their agency can achieve its mission.

The Problem

Agencies are pretty specialized—in federal government, their key needs are unique, and in state government they’re one of just 50 (or as many as 56, depending on how you count) agencies with those needs. The truly generic needs can be addressed via commercial off-the-shelf software (COTS), but the mission-unique stuff must be met by what I call load-bearing software, which is inherently custom.

Load-bearing software became a thing in federal government midway through the last century. The ur-example of this is the IRS’s Individual Master File, their core computing system, written in COBOL and IBM System/360 assembly, which debuted in 1960. Load-bearing software became an increasingly common need for government agencies in the 1980s and 1990s, with that software almost entirely internal-facing. That changed in the 2000s and 2010s, as the public gradually came to expect internet-intermediated interactions with agencies, especially for application processes. In 2020, Covid forced agencies at all levels of governments to move service delivery online, and three years later there is no sign of that shift receding.

If a state unemployment agency’s UI system doesn’t work, in what sense are they a UI agency? If a state’s EBT system goes down, in what sense do they provide SNAP benefits? If the IRS’s Individual Master File crashes, in what sense are they a taxation agency?

Load-bearing software must work for agencies to achieve their missions. And yet, under the standard outsourcing paradigm, agencies outsource every aspect of the construction, maintenance, enhancement, support, and hosting of this software. In doing so, they outsource their mission. This is a terrifically dangerous practice.

When an agency principal lacks the knowledge or even interest to understand and control these software projects, they are handing their control of the agency to a consulting firm’s project manager. No leader wants to do that.

The Solution

In short, agency leaders need to give a damn about technology procurement, budgeting, oversight, and implementation. Load-bearing software is not a detail—it’s the whole ballgame.

There are four things that principals need to learn if they’re to control their agency’s ability to achieve its mission:

  1. How modern software is made
  2. What’s possible, at what level of effort
  3. How much software costs
  4. How to oversee software development

Let’s review each of these.

How modern software is made

To know how software gets built today, there are six core concepts that agency leaders need to grasp:

  1. User-centered design
  2. Agile software development
  3. Product ownership
  4. DevOps
  5. Building with loosely coupled parts
  6. Modular contracting

There’s a short overview of each of these in GSA’s “State Software Budgeting Handbook,” which I co-wrote in 2019, so I won’t re-explain them here. It’s not enough for agency principals to read a paragraph about each of these, though. Without about an hour of training in each of these subjects, agency leaders can have a good base of knowledge to how projects are being executed—or should be executed—by vendors and agency staff.

What’s possible, at what level of effort

Many of the work overseen by agency leaders draws from fields that they’re already equipped to understand the complexity of. If an agency needs to hire 500 new employees, a leader knows intuitively that this possible, and that will take longer than two months but less than two years. If it needs to buy new office equipment for 100 people, that’s very achievable, and will cost more than $100,000, but less than $1,000,000. If the agency needs to move into an entirely new building in two months on a budget of $5,000, that is not possible. And so on.

There is absolutely nothing that has prepared an agency principal for understanding the cost of software. There are a pair of xkcd comics that address this concept:

Tasks,” by Randall Munroe
Easy or Hard,” by Randall Munroe

The best corrective for this is to observe actual Agile software development teams actually developing software, by joining a series of sprint review sessions for multiple projects. That makes it possible to see what e.g. six people are capable of accomplishing within two weeks of work.

How much software costs

Grasping the cost of software is difficult in the space of government software because of the absurd levels of pricing distortion brought about by decades of procurement practices unsuited to the problem. At a state or federal level, $100 million is a normal price for the development of a load-bearing software system, and that’s a price tag that’s not meaningfully decomposed to any part of that system.

Again, observing actual Agile projects will do a lot of good here. By understanding the level of effort, and connecting that to the billing rate of a vendor, it becomes possible to see that the work produced by our six-person team in two weeks cost $60,000 in billed time. Observing this for a while, it soon becomes evident what software should actually cost.

How to oversee software development

I’ve already written a guide to overseeing major software projects, albeit intended for legislatures, but the 14 listed plays largely apply to an agency principal, either for them to apply themselves (especiallyDemos, not memos“) or to ensure that their staff are applying.

But there are a few leader-specific admonitions that I want to include here.

  • No stoplight charts. Agency principals receive regular updates on major software projects that say everything is going great, with green lights all the way….right up until the update that says everything is red and the project has failed. What happened? In short, strategic misrepresentation. Nobody wanted to provide troubling news to their boss, so as the state of the project got handed up the chain, the view got rosier and rosier, until the principal was told, with every update, that things were going great. The solution to this is to eschew reports in favor of live demos of the actual work being done. If leadership requires reports, make them narrative, authored by the agency’s product owner for the software project.
  • Require that live software be deployed to production regularly. Project will spin their wheels in isolation. This allows for bad decisions to be hidden, for a lack of progress to obfuscated. Leaders should insist that software improvements be incrementally delivered to end-users. Continuous delivery paired with continuous user research makes it very difficult to waste much money on a software project.
  • Require weekly ship reports. At the end of each week (or each sprint), a project’s leadership should write a “ship report,” which briefly describes what was shipped since the last ship report, along with what’s coming up, what blockers are preventing the project from progressing, and how much of the budget has been spent on the project to date. For any project that requires particularly close monitoring, it helps to require the inclusion of a list of all user stories that were completed within the period in question—this makes it crystal clear what the project team has accomplished.

This is all meant to to avoid precisely this scenario:

“Deloitte presented much too rosy of a picture to us,” [Governor Gina Raimondo] said. “I sat in meetings with Deloitte and questioned them and they gave us dashboards that showed us everything was green and ready to go, and the fact of the matter was it wasn’t.”

Raimondo Faults Vendor Deloitte For Delivering ‘Defective’ UHIP System,” The Public’s Radio, Feb. 15, 2017

It can be tremendously difficult for the principal of an agency to oversee a load-bearing software project, in no small part because leaders tend to get in the way, providing bad ideas, issuing self-important new project requirements, and measuring the wrong things. Learning how modern software is made will ensure that principals set their expectations properly and can engage in an appropriate fashion. Learning what’s possible, at what level of effort, will allow principals to understand and make demands that are reasonable. Learning how much software costs will allow principals to have reasonable expectations of costs, both in preparation for projects and when executing them. And learning how to oversee software development will draw together the prior three skills so that principals can effectively manage the projects on which the viability of their agency’s mission depends.

This approach will allow an agency principal to take back control of their agency from vendors, and to stop outsourcing their agency’s mission.

“Agile” versus “agile.”

When writing about Agile software development, I always capitalize the word. This isn’t an affectation, but instead an effort to communicate an important distinction.

The word “agile” has been a la mode for a few years now. Organizations should be “agile,” teams should be “agile,” leadership should be “agile,” employees should be “agile,” software should be “agile.” This use of the word is intended to indicate being nimble, flexible, and adaptive.

This is almost completely unrelated to “Agile” software development. Agile is a software development practice, summarized as valuing:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

There are different methodologies for implementing Agile—Scrum being the most common—but, in general, capital-A “Agile” means delivering software every two weeks, with all completed work being based on user needs that have been identified and validated through user research.

Lowercase-A “agile,” on the other hand, means none of that. It’s puffery. It means nothing.

When a government agency or a contractor says “oh, yes, we’re agile,” it’s important to find out if they mean “agile” or “Agile.” And when communicating with that audience, it’s important to make clear if you mean “Agile.” The mere capitalization of a letter isn’t the totality of how to accomplish that—it’s better to ask clear and direct questions about how they build their software—but it does help to consistently writing “Agile” when you mean Agile software development and “agile” when you mean nimbleness and flexibility. Even somebody only dimly aware of Agile software development is liable to take note of the capitalization of the word and realize that something very particular is being communicated there.

Capitalizing “Agile” helps to be clear in communications. I recommend it.

The work before the work: what agencies need to do before bringing on an Agile vendor.

Government agencies often hire Agile software vendors to build software for them, but then fail to do the pre-work that would allow the vendor to be successful. Interfacing a great scrum team with a standard government IT shop is like dropping a Ferrari engine into a school bus. There’s work to be done up front before there’s any sense in bringing on a team.

18F, the federal government’s software development shop, has done a lot of work to address this problem, and much of what I know about how to do this right comes of my four years there.

I’ve seen what happens when a high-performing team is dropped into a low-performing agency. The agency spends weeks or months running the team through mandatory security trainings, getting the team their PIV cards, procuring and issuing them agency laptops, getting them accounts on the VPN, getting them access to the various servers that they’ll be using, etc. The team expected to start writing software on day one, but instead they were left twiddling their thumbs at a collective $1,000/hour. They get bored, and after a few weeks, anybody competent gets sprung from their purgatory and put onto a project that’s actually doing stuff. When the team can finally get to work, all that remains is the people who weren’t good enough to escape, and morale is low.

Don’t do this.

Agencies need to prepare for vendors, not a couple of weeks before the vendor shows up, but months beforehand. Ideally before a solicitation is even issued. That’s because it’s a lot of work! It requires user experience research, journey mapping, process automation, coordination between agency silos, and perhaps even a prior procurement. Here are some of the specific things to do:

  • Allow the vendor to work entirely in the cloud. To the greatest extent possible, make it possible for the vendor to never touch your environment. That means that your agency needs to contract with a cloud vendor (but you did this long ago because it’s 2023, right?), ideally Microsoft Azure or Amazon Web Services, since those are the most widely used. That way the vendor can replicate your environment within their own cloud environment, and never have to touch your cloud environment, and certainly not your agency’s VPN and warren of physical servers. This eliminates a big part of the onboarding process, with the happy side benefits of significantly reducing stress among the vendor team (they can’t break your environment if they don’t have access to it) and reducing the cost of the vendor’s professional liability insurance policy.
  • Don’t make the vendor use GFE. Government-furnished equipment is awful, the cheapest stuff that Dell or HP makes, bought in lots of 1,000. Developers are disproportionately likely to be Mac users, but even the Windows users don’t want to use the laptops you got from the lowest bidder. They want 32 GB of memory so they can run the software entirely in Docker containers; your laptops have 8 GB of memory because they’re optimized for people using Word and Outlook. It’s like hiring a great woodworker to build stuff for you, but forcing them to use your collection of Harbor Freight tools. If they’re working on agency equipment then they have to jump through a bunch of hoops like acceptable-use policies, mandatory trainings, and you have to do things like actually procure the hardware, send it to them, and get it back when they’re done. No Agile team wants to use your lousy equipment, and you don’t want to deal with issuing it, so just don’t.
  • Put together a journey map of the vendor onboarding process. Once you’ve ensured that the vendor team will work in the cloud on their own equipment, step through the process of what’s left in bringing on a vendor. Talk to employees of vendors who have recently gone through it, or are currently going through it. Map out every step. Now, worst case, you have a document you can share with the new vendor to understand what’s ahead of them and where they are in that process. But, better: optimize the hell out of that process, removing anything that you can, simplifying anything that you can. Every hour that you shave off this process will save $1,000.
  • Study and document the process when the vendor starts. The first time you bring on an Agile vendor, the process you’ve put together won’t be done. It will be better, but there will still be frustrations. Discuss this with the team during their first days on the project, ask that it be a subject of their first sprint retrospective, and turn that feedback into specific change for the next time that a new person joins the scrum team.
  • Designate a product owner. Agile projects’ success is heavily dependent on the product owner’s abilities to do their job well. I’ve written about this elsewhere, but the short version is that the agency needs an empowered product owner to start to take an ownership role over this project before the vendor shows up. This person will be the vendor team’s primary interface with the agency, the fixer for onboarding problems, the smiling face who will greet them at 9 AM on day one of sprint one. Have them in place many weeks before the vendor starts.
  • Create a path to production. Two weeks after the vendor starts, they’ll have code that’s ready to go to production. If your agency has an authority to operate (ATO) process that requires completing a 250-page system security plan (SSP) as a Word file to get anything in production, you’re going to have a bad time. You need to pilot a new ATO process that can move at the speed of software developer, or else your vendor team will be unable to get their software before actual end users. And Agile is impossible without that crucial feedback loop. This problem is really hard. Don’t be fooled by the fact that this is single bullet point in a list of six items—it’s of greater difficulty and complexity than the others. This is a 6–12-month process for agencies, probably longer for federal agencies. Consider making the immediate project a pilot, so that instead of proposing that your CIO overhaul the ATO process entirely, you’re simply proposing that an experiment be run to see if a continuous ATO process could work.

You don’t want to do all of the hard work of procuring a top-notch scrum team only to send them face-first into a brick wall of bureaucracy. You’ll lose all of the momentum, lose the best members of the team, waste $40,000/week, and when the project finally starts it will be with a demoralized team. By doing the right prep work, you can leave that team free to do what they do best—develop software—and get the best possible performance out of the vendor.

“Customized COTS” is the worst of both.

Some vendors who sell commercial off-the-shelf (COTS) software to government bristle when their software is described as such. They want you to know that it’s not COTS: it’s “customized COTS” or “modifiable COTS.” That is intended to reassure agencies that their software is flexible, that it can meet the agency’s needs. But, in fact, “customized COTS” is actually much worse than COTS.

Vendors are generally eager to have their offerings fall under the umbrella of “COTS,” because both state and federal government heavily favor buying existing software over having custom software built. This is good and sensible. It would be foolish to have a custom word processor built when Microsoft Word and Google Docs exist, and ideally available for licensing at a significantly lower cost than custom development.

But the COTS label is a millstone around the neck for vendors of software that drives the operations of agencies operating under highly localized regulatory regimes. Take unemployment insurance. Every state labor agency operates under a dizzying array of state and federal laws and regulations about who can get coverage, for how long, for how much money, under what circumstances, on what timeline, through what qualification processes. And those laws and regulations change continuously. I don’t want to say that it’s impossible to build a COTS tool that could handle all of those variations, but I will say that it would be enormously difficult, and would require some very specific architectural decisions that I know that none of the vendors have made. In practice, every new state that becomes a customer for such a system would introduce vast new complexities into the code base, requiring that the “COTS” product actually be forked, with customizations made for that new state. That brings its own complexities, because any changes across the code base need to be grafted manually into each forked copy. This is no longer “COTS,” by any reasonable definition, but is instead what 18F calls “UMOTS,” or “Unrecognizably Modified Off the Shelf Software.”

Customized COTS is just custom software that the agency doesn’t own, the equivalent of paying for extensive renovation to a home that you are renting. If a state is going to use COTS, they should do so because they are happy with the software as it exists, and do not require modifications. Nobody would buy Microsoft Word and then demand that Microsoft add an essential feature that is missing. That violates the purpose of COTS, which is that the vendor has made those decisions for you. If you don’t like their decisions, don’t buy their product.

Sean Boots has a good test for the legitimacy of COTS: “If you can get a software solution to successfully meet your needs in one day, it’s a real COTS product.” I propose a corollary for testing the legitimacy of customized COTS: Will all customers receive identical software updates? If yes, it’s probably COTS. If no, it’s customized COTS, and you’re paying to renovate a house you are renting.

COTS can be great. Custom software can be great. Customized COTS is a tar pit, a way to pay for extensive renovations to software that you do not own, and now feel that you cannot leave, because the sunk cost fallacy is real. Don’t license customized COTS.

Why governors put this over here, with the rest of the fire.

It’s happens at least once in every gubernatorial administration: presented with a disastrous, multi-year, failing software project that’s preventing an agency from accomplishing its mission, the governor awards a big contract to a big vendor, maybe even the vendor that’s the source of the problem. Some major culprits are unemployment insurance, enterprise resource planning, Medicaid, child welfare, and payroll—all load-bearing systems for their agencies. Solving these failures by signing another big contract nearly always makes things worse. So why do governors do this?

Serving as governor is to be presented with a never-ending stream of decisions to be made, all of which have been vetted through several layers of people. Those decisions are generally teed up to include options, in the form of a right option and a wrong option, with the governor’s advisors fervently hoping that their principal will simply make the “right” choice. There is rarely time for the governor to go deep in any area. A state is a stage full of spinning plates; the governor’s job is to go where directed and give a plate a quick push, and to repeat this many times each day, for 4–8 years.

Decision-making at this level is all about triaging. The easiest option is the preferred option. It’s better to dispose of a problem permanently than temporarily, better for a longer time than a shorter time. The top priority is to get things off the governor’s desk.

An animated GIF of a man carrying a flaming fire extinguisher cautiously. He's saying "I'll just put this over here, with the rest of the fire."
Maurice Moss, in “The IT Crowd,” triaging.

This imperative is embodied in the chief of staff, who spends the bulk of their time blocking for the governor, ensuring that questions only come to the governor when they are ripe for a decision, and that their principal has enough information to be able to make that decision.

So what should a governor do when faced with a failed software system that is preventing an agency from delivering on its mission? The correct response is to have the state take control from the vendor, because no vendor will ever care about the state’s mission as much as the state. They need to move to shorter contract periods, an Agile delivery cadence, agency product ownership, and root all work in user research. But you can’t tell that to a governor. Literally, you can’t—the chief of staff will hurl themselves in front of your body to stop you. Because what you’re saying to that governor is “what if, instead of making a single decision, we replaced this with a large amount of time-consuming work and a series of decisions over the course of months or years?” It’s completely contrary to the entire process used to operate governors’ offices.

What the governor is hearing from vendors, on the other hand, is very compelling. The incumbent vendor and their competitors are all saying the same thing: write us a big check and we’ll make all of your problems go away. Will they actually make all of those problems go away? Absolutely not. Will throwing more money on the money fire make the fire go away? No, that’s not how fire works. But this message from the vendor is exactly what the governor’s office is optimized for, and exactly what the agency secretary’s office is optimized for. They cannot escape the siren song of the vendors, and nobody warned them about the need for mast-lashing and beeswax.

Even if a governor knew that the problem would return with full fury in 6–12 months, with accompanying admonishing headlines and stern editorials, it’s entirely possible that they would still elect to award that big contract, simply because it makes the problem go away for 6–12 months. A lot can happen in 6–12 months! Maybe something more important will be in the news cycle then. Maybe they’ll be out of office. Maybe they’ll be less busy and will have time to really buckle down and pay attention to this UI / PFML / MMIS / childcare / whatever situation. But all of that is a problem for Future Them. Current Them has other stuff going on.

In short, governors make the worst decision because, in that moment, it feels like the safest decision, and it may even be the safest decision, although only in a political sense. Although the outcome is generally terrible, governors are behaving rationally. Without changing the incentives, they’ll keep throwing money on the money fires.

An Excel error caused a $202 million state budget shortfall.

On Monday, the Richmond Times-Dispatch broke the story about a thorny budgeting problem for Gov. Glenn Youngkin that illustrates how bad technical practices can lead to bad public policy outcomes:

Local school divisions in Virginia just learned they will receive $201 million less in state aid than they expected — including $58 million less for the current K-12 school year that is almost three-quarters done.

The Virginia Department of Education has acknowledged the mistake in calculating state basic aid for K-12 school divisions after the General Assembly adopted a two-year budget and Gov. Glenn Youngkin signed it last June. The error failed to reflect a provision to hold localities harmless from the elimination of state’s portion of the sales tax on groceries as part of a tax cut package pushed by Youngkin and his predecessor, Gov. Ralph Northam.

The Washington Post provided more specifics about the source (or perhaps manifestation) of the mistake:

The problem originated with an online tool that allows school districts to see how much funding they should expect from the state, a number that takes into account the district’s number of students, how much it receives in property tax revenue and other factors.

The tool has been up since June 2022, allowing districts to build their budgets around the estimations. But last week, someone — the state would not say who — realized that the numbers were wrong. The miscalculation occurred after the state failed to account for funding changes connected to the elimination of the state’s tax on groceries, which took effect Jan. 1.

It’s not clear whether this was a conceptual problem (a failure to realize that it was necessary to account for funding changes) or a technical problem (an error of implementing that math). If the latter, this is an high-impact error from a software failure.

(I’d be remiss if I didn’t point out that some reasonable people are suspicious of this explanation, pointing out that Youngkin is no supporter of public education, and that it’s convenient that this mistake aligns with his policy preferences. But I think it’s much more likely to be a mistake, one that a governor would have no knowledge of or insight into. But that explanation is awkward for Youngkin, who has presented himself as a hard-nosed budget wonk whose private sector financial experience translates to fiscal competence. And yet, this.)

It’s instructive to look at the “online tool” in question, which turns out to be an Excel file. (Here’s a Wayback Machine link to the Excel file, because I expect that the problematic one will disappear. I’ve also put it in Google Sheets.) It has 38 worksheets, with a heterogeneous and puzzling series of titles like “Enroll. & At-Risk,” “FINAL SOURCE DATA,” “March 31, 2021 ADM,” “ASRFIN Queries,” and “Bedford County-City.” The message on the first worksheet would seem to indicate that the state published this without removing all of the placeholder text, which raises the question of how what else might be unreviewed or incomplete.

Many of these worksheets are dizzying, some hundreds of columns wide, most containing unexplained acronyms like “DABS,” “RLE,” “PPAs,” “ADJ ADM.” I don’t doubt that these make a lot of sense to state and local budget officials, but I have none of the subject-matter expertise to make heads or tails of them.

Excel is a fine way to build lightweight calculation software—you can build some pretty sophisticated systems in Excel and Google Sheets—but it shouldn’t serve as load-bearing infrastructure. Excel files can’t be diffed or version-controlled using standard revision control systems (e.g., Git). It’s impractical to perform automated tests on Excel files, as a part of a continuous integration process. Excel files become unwieldy as the number of worksheets increases—I can’t say where the tipping point is, but it’s for sure lower than 38. For a tool as critical as this one, it’s important to be able to at least perform some smoke tests, so you can check that providing providing particular sets of financial assumptions returns the correct numbers, and those should be run automatically every time that the tool is updated.

No doubt this started as some small, simple file, many years ago, put together by somebody at the Department of Education for internal purposes, shared informally with some municipalities, but gradually shared more broadly and standardized on. And then it grew and grew, without the necessary resources provided to support it that were commensurate with its newfound importance. Surely most other state agencies are vulnerable to similar failures with similar impacts due to the same problem.

Software failures causing public policy failures are a defining feature of our era. In that sense, this is a normal failure, although I’m not familiar with another instance of an Excel error in a government budgeting documenting leading to such a large financial problem. But history does give us one example to draw on: Fidelity Investments’ 1994 omission of a minus sign (-) from a spreadsheet, which rendered their $1.3 billion loss into a profit of $1.3 billion. They dutifully notified the three million investors in the Magellan mutual fund that they’d receive a dividend of $4.32 per share…only to have to notify all of them that they’d actually receive nothing, after outside auditors caught the mistake.

Spreadsheets are great tools. But some applications require more rigor, and we can see here that mutual funds and state budgeting are two strong examples.

Pity Francis M. Wilhoit.

You’ve got to feel for Francis M. Wilhoit. Born in 1920, the Harvard-trained political scientist spent his entire career in academia, working as a professor at Iowa’s Drake University. He was published on subjects such as nationalism, equality in freedom, and the impact of populism on Black residents of Georgia. The topic that really motivated him was his opposition to racism. His PhD thesis was about the politics of Massive Resistance, which was published as a book in 1973. While he was no Mearsheimer, Walt, Huntington or Waltz, his career was one to be proud of. He died in 2010.

Despite all of this, what Wilhoit is most remembered for is this one, brief quote:

Conservatism consists of exactly one proposition, to wit:

There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.

It was an astute observation, though a sensible observation for somebody who had spent so long studying populism and racism in U.S. politics.

Unfortunately for Francis M. Wilhoit, this remark was in fact written by one Frank Wilhoit—no relation—on a comment posted to Crooked Timber in 2018, eight years after Francis M. Wilhoit’s death. The name is an absolute coincidence.

Frank Wilhoit is a 63-year-old classical music composer who lives in Ohio. In an interview with Slate this past June, he expressed horror at his quote being often attributed to Francis M. Wilhoit, not because he feels he’s owed some credit, but because he regards it as unfair to the deceased political scientist. He told his interviewer:

I had absolutely no right to create that kind of confusion, to pose that kind of insoluble problem for the custodians of his legacy. They will be playing whack-a-mole with that misattribution for all future time.

It’s clear in the interview that Frank Wilhoit is a thoughtful, erudite guy. But, again, he is not a political scientist, so it remains impressive that he managed to write a comment on a blog that has, in reputation, accidentally eclipsed the reputation of Francis M. Wilhoit. It’s like some baseball fan named Tony Peña being randomly selected from among the attendees at Fenway Park during the seventh-inning stretch, to come onto the field to try to hit a few balls pitched by Roger Clemens…and hitting a 510-foot homer. For decades after, people would understandably assume that it was the “real” Tony Peña who did it.

Perhaps it’s for the best that Francis M. Wilhoit didn’t live long enough to see a homonymous composer receive vastly more recognition for a blog comment than he did for his life’s work.

Car buyback offers are bad CX.

A few years ago, my wife and bought a new car: a Chevy Bolt EV. About a year later, the dealership began a drumbeat of emails, phone calls, letters, and postcards, each communication proposing that we sell our car back to them for a different dollar value each time. The acute shortage of both new and used cars led to inflated values, and the dealership presumably wanted to make the most of this. But these proposals mostly annoyed us, because they were offering to make our lives worse.

We hadn’t bought a car because it seemed fun or interesting, but, like most people, we did so because we needed a car. When the dealership proposed that they buy back our car, they were proposing to create a problem for us: the problem of not having a car. Given the very shortage of new and used cars, we had no interest in trading our car for money. After all, it was just a year prior that we’d traded money for our car! We need a car. Not having a car is not an attractive offer.

I actually answered the phone for one of these promotional calls, and was able to tell the dealer why their offer held no allure. I explained that we would entertain an offer to exchange our car for a newer EV. Instead of telling us that they’d pay us $X for our 2019 Bolt, we’d rather they propose that we trade our 2019 Bolt for an e.g. 2023 Bolt for a cost of $Y. This did not compute, apparently, because over a year later, we continue to receive a buyback offer every couple of months, none proposing anything beyond us no longer having a car.

I guess this works, or else they wouldn’t do it, but it seems like it would work a lot better if their proposal wasn’t to create a problem for people, but instead to at least propose a problem and a solution, all in one go.

Apple Card account verification considered harmful.

My wife was startled awake from her nap by her iPhone’s ring. She answered, groggily. The caller informed her that they were calling from Apple, and needed to verify her account. This made a sort of sense, because just the day prior, she’d communicated with customer support about a problem with our new, Apple-branded credit card. The caller said they’d be sending her a text message, and could she please read the verification number? The text message arrived immediately, and my wife dutifully read off the six-digit number. The caller thanked her and hung up.

As she finished waking up, she realized this seemed strange, on a few levels.

It’s not just strange: it’s a well known scam.

I’m going to give away the ending here. It wasn’t a scam. This was a legitimate call, legitimately representing Apple.

Here’s how this scam works. The criminal selects a target, from a list of known customers thanks to a prior data breach of First Bank of New York (to invent an example). He tees up First Bank of New York’s “reset your password” functionality, which is designed to help out customers who have gotten locked out of their accounts. It will send a text message to their phone number of record, which they can type into the website to verify their identity, and then select a new password. He then calls the target, claiming to be with First Bank of New York, but could they please verify their identity? And he clicks that “Submit” button on First Bank of New York’s “reset your password page,” triggering a text message to his target. The target dutifully reads off the number, the criminal types it in, and, boom, he has access to the target’s bank account.

This, obviously, was what happened here. I set to work immediately, examining her call log and the text message, to figure out who was stealing what from us, to try to act before they could.

GS authentication code: 524886. Contact your GS team if code was not requested. Txt STOP to end or txt HELP
The text message from Apple, along with me trying and failing to tease more information out of the service.

Within a couple of minutes, I realized that the one and only thing that could be relied on here was the text message. The scam would only work with a text message actually triggered by a real service. Apple doesn’t confirm identities with text messages, but instead with an OS-level service. So it couldn’t be Apple.

I had two clues to go on: the short code (87175) and “GS.” Friends immediately helped me brainstorm what “GS” could be, and one suggestion (Goldman Sachs) seemed plausible, since LexisNexis’ identity verification service uses that short code, and that was a sensible vendor relationship.

The Apple Card is with Goldman Sachs. Somebody was stealing our credit card! I immediately locked our cards, which is a trivial setting on iOS, and my wife called the Apple Card’s support number to report the fraud.

That was when the employee at the support number—an apparent Goldman Sachs employee—provided some surprising information: the call had been legitimate. Goldman Sachs, in Apple’s name, had used a classic identity-theft ruse.

My wife asked what the purpose of the phone call was and she was told that it was to verify her identity. …What? They’d done that just a month prior, when we opened the account. And their text message went to the very phone number that they were calling! The text message added nothing! The message itself, from “GS,” while the phone call claims to be from Apple, is further confusing. The call did absolutely nothing to verify my wife’s identity, nor could it possibly have done so, as designed.

Screenshot of an Apple ID verification code.
This is what Apple’s legitimate verification code service looks like.

Apple and Goldman Sachs are teaching customers that it’s not just OK, but actually necessary to read verification codes out to strangers who call on the phone.

I’m appalled. Obviously, Apple knows better than to employ a pattern common to fraud (I’m aware of no other aspect in their business where they’d allow something like this), but Goldman Sachs should know better, too. I’ve had an American Express for many years, and every interaction I’ve ever had with them including several cases of fraud, was handled flawlessly. Their security practices are top-notch. I’d assumed that was industry standard, but clearly I was wrong. I’d assumed that Apple’s involvement with the Apple Card would lead to extraordinary security practices, but clearly I was wrong about that, too.

I’d intended to switch from American Express to the Apple Card over the next few months, but now that doesn’t seem like a good idea.