AI at the Back of the Bank

The deposit operations centre of a large American commercial bank — Bank of America, JPMorgan Chase, Wells Fargo, U.S. Bank, any of the regionals of any size — is a place that, by the standards of any other industrial facility, runs quietly. It is, in most cases, a room in a windowless building on a service-road exit somewhere outside a mid-sized American city, with the bank’s name on the building (where it appears at all) only in small lettering near the loading dock. Inside, a sequence of high-speed document scanners pulls checks past an imaging head at perhaps ten or twelve a second; behind the scanners, a row of servers runs character-recognition models against each image, extracts the routing number, account number, check number, amount, and signature, compares each field against the bank’s internal records, flags the exceptions to a human reviewer in a separate department, and posts the rest to the customer accounts. The room handles, on a typical business day, perhaps a hundred thousand checks. Across all the deposit centres of all the American banks, the daily volume is something on the order of forty million, and that figure does not include the further forty or fifty million paper-equivalent items — payment remittances, lockbox payments, mortgage statements, insurance premium notices — that pass through the parallel lockbox operations of the same banks for their corporate clients. The total volume of documents read by character-recognition systems at American banks in a single business day is something close to a hundred million.

This post is the third of three correctives — the first, yesterday, was on the emotional-support use of frontier language models; the second, the same day, was on the industrial visual quality grading at American beef-packing plants — to the five posts on AI-lab trajectories I wrote earlier this month. The two prior correctives argued that the analyst class has a coverage problem: that the largest and most-embedded AI deployments in 2026 are systematically the ones the trade press cannot see. This post closes the trio by treating the third of the three rooms the slaughterhouse post named — alongside the late-night chat and the kill floor, the back of the bank — and by making, on the basis of this third example, the strongest version of the structural argument the three posts together advance.

The Room and the Number

A few facts, in the order one would want them.

The American banking industry processes, in a year, somewhere between eleven and thirteen billion paper or paper-equivalent items. The number is down from a peak of roughly fifty billion in 1995, before electronic payments and direct deposit and online bill-pay had displaced the dominant payment instrument of the mid-twentieth century, but it remains the dominant volume of the document-processing economy and a number that, multiplied by the cost of human handling that would otherwise be required, runs into the hundreds of billions of dollars a year of avoided labour cost. Every one of those items is, in modern practice, read by a character-recognition model. The Federal Reserve’s check-processing operations, the major commercial banks’ deposit centres, the corporate lockbox operations, and the back-office sorting houses of every payment processor in the country — all of these run on the same basic apparatus: a high-speed scanner, a character-recognition pipeline, a signature-verification model, an exception-handling queue.

Beyond banking, the Internal Revenue Service processes, each year, roughly two hundred and forty million individual tax returns and somewhere over four billion information returns — W-2s, 1099s, 1098s, K-1s, and the rest of the alphabet of forms by which one income event is reported to the federal government. The IRS’s three Submission Processing Centres — in Austin, Kansas City, and Ogden — run scanning and character-recognition pipelines on every paper return that arrives and every paper information return their corporate filers still produce; the IRS has, since the 2023 launch of its Document Upload Tool, also processed taxpayer-submitted document images through the same character-recognition stack. The total annual document volume of the IRS’s submission processing is something on the order of five billion items, all of which pass through optical character recognition before any human eye reaches them.

Beyond the IRS, the American mortgage industry originates roughly five to eight million mortgages a year, each with a hundred to three hundred pages of supporting documentation; the entire stack is, in modern practice, scanned and OCR-extracted into the loan-origination system. The American property-and-casualty insurance industry processes about two hundred and fifty million claims a year, most involving photographed supporting documents (police reports, repair estimates, medical records) that pass through document AI before reaching an adjuster. The healthcare claims industry runs an analogous pipeline at vastly greater volume — somewhere over five billion claims a year, all involving documentary backup that is scanned and read by machine.

The deployment numbers, summed across all these domains, are an order of magnitude larger than the deployment numbers for any consumer-facing language model. Every adult American has, in the past year, had thousands of documents on their behalf read by a character-recognition system. The system reads the documents reliably enough that, in the great majority of cases, no human reviewer is involved at any point in the chain. The deployment is older than the modern internet, older than the personal computer, older than the consumer credit card; the first MICR check-reading systems were installed in American banks in the late 1950s, the first general-purpose document OCR in the 1980s, the first deep-learning-based document AI in the early 2010s. The pipeline has been improving incrementally for seventy years. It has, by 2026, reached a level of capability at which most documents in most domains are read more accurately by machine than by the average human reviewer the machine has replaced.

The Wrong Newsroom

One would think, given these numbers, that the AI trade press would have something to say about them. It does not. The AI press — TechCrunch, The Information, Stratechery, the AI sections of the Wall Street Journal and the Financial Times, the newsletter cottage industry of the past three years — has, as far as I have been able to find, written almost nothing about bank OCR, IRS document processing, or insurance claims triage at any point in the past five years. The deployment is too long-running to be news; the publications that cover it (American Banker, Bank Director, Insurance Journal, Mortgage Banker Magazine, the trade-press of each affected industry) are not on the AI beat; the vendors involved (ABBYY, Kofax, IBM, Fiserv, FIS, Oracle, the in-house document-processing groups of the major banks) are B2B back-office software companies that do not seek out the consumer AI press. The industry, when it describes its own work, does not even use the word AI — the preferred terminology is intelligent document processing, or back-office automation, or document AI, or capture and recognition, and the language alone is sufficient to keep the work outside the AI conversation.

This is the third face of the same problem the first two correctives described. The first was emotional support, the largest single use of frontier language models in 2026 by conversation volume, uncounted by the analyst class because it does not generate API revenue. The second was industrial visual quality grading, the most thoroughly embedded use of AI in the American physical economy, uncounted because the kill floor is not in any of the rooms the analyst class visits. The third is document AI, the largest single deployment of artificial intelligence in the American information economy, uncounted because the back-office of every major American institution is even more thoroughly outside the AI press’s beat than the kill floor was. The trade press visits the labs and the consumer apps; it does not visit the deposit operations centre, the IRS Submission Processing Centre, the lockbox sorting room, the insurance claims-triage office, the mortgage origination back-office. These rooms account for, by any conservative estimate, more than half of all the AI inference performed in the United States in a typical business day, and approximately none of the AI press coverage.

Why It Works in That Room

There is a quiet structural argument for why document AI became the deployment it became, while a hundred more famous applications stalled. It rests on the same three properties of the task that the slaughterhouse post identified for visual quality grading, applied here to document processing.

The first is that document reading is, at its core, a perception task and not a reasoning task. The system does not need to understand the check; it needs to read the routing number, the amount, the signature, and the date, and to flag any field that fails to match the expected pattern. There is no chain of inference, no consideration of intent, no context window of relevant prior facts. A trained vision-and-character-recognition system, given enough labelled examples, is the right tool for this shape of problem, and was the right tool well before the language-model era. The cameras of the slaughterhouse did not have to wait for the breakthroughs of the 2020s; nor did the OCR pipelines of the banks. Both were already capable enough by the early 2000s to handle the great majority of their respective workloads, and the improvements since have been refinements rather than revolutions.

The second is that the conditions of the task are unusually controlled. Every check conforms to a standard format prescribed by the American National Standards Institute and the Federal Reserve; the MICR line at the bottom uses a specific stylised font (E-13B) chosen for machine readability; the document size, paper weight, and printing standards are regulated to the millimetre. The IRS forms similarly: every W-2, every 1099, every K-1 conforms to a federal layout from which deviation is forbidden by penalty. Even the unstructured side of the document — the handwritten amount on a check, the medical records in an insurance claim — passes through a pipeline that knows where in the document the unstructured content will appear and what range of values it is likely to contain. The variables that defeat consumer-grade vision systems are, on the document, engineered out of existence by the regulatory and industry standards that govern the document’s design.

The third is that the supply of training data is exceptionally good. Every document in the system is associated with a downstream economic outcome — a posted balance, a tax assessment, a paid claim, a settled mortgage — that is independently verified. The ground truth for any character-recognition decision is grounded in dollars, the same way the slaughterhouse’s ground truth was grounded in the market price of the cut. Misreads are caught by reconciliation; reconciliation feeds back as training data; the model gets better. Seventy years of this feedback loop has produced a system that, at the high-volume document end, exceeds the average human reviewer it replaced.

These three properties — perceptual task, controlled conditions, dollar-anchored labels — are not unique to bank documents. They are the structural conditions that produced the slaughterhouse’s beef-grading deployment, that have produced the warehouse-vision deployment at Symbotic and Amazon, that have produced the optical-character-recognition deployment at every utility and every government office and every payments processor in the developed world. The pattern is consistent. The depth of deployment is consistent. The invisibility to the AI press is also consistent.

What This Is Not

One ought to be plain about what this is not.

It is not a story of a profession disappeared, though some professions have been substantially reduced. The “check encoder” of the mid-twentieth century — the bank employee who keyed the amount on every paper check into the MICR encoding system — was once a category of work that employed tens of thousands of people across the American banking industry, and has been reduced over four decades to perhaps a few thousand exception-handling reviewers across the entire industry. The IRS still employs a substantial workforce in submission processing, but the same workforce now handles a document volume that is perhaps ten or twenty times what the same number of employees could have handled in 1980. The labour has been rebalanced rather than eliminated; the experienced reviewers who remain are arguably more valuable than they were in the human-only era, because their judgment is what resolves the exceptions the machine flags.

It is not a story of an industrial deployment that arrived through any sudden breakthrough. The first MICR systems date to the late 1950s; the first general-purpose document OCR to the 1980s; the deep-learning era began in the early 2010s and has produced incremental capability improvements since. The arc has been a seven-decade unbroken line of steady refinement and steady regulatory accommodation. The lesson, if there is one, is the same dull lesson the slaughterhouse post named: the most consequential AI deployments are not the ones one reads about on the day they ship.

It is not a story of an unregulated industry. American banking OCR is heavily regulated, under the Check 21 Act of 2003, under the Office of the Comptroller of the Currency’s examinations of bank operations, under the National Automated Clearing House Association’s standards, under the Federal Reserve’s payment-system rules. The IRS document-processing infrastructure is regulated by the Internal Revenue Code and audited by the Treasury Inspector General for Tax Administration. The insurance claims systems are regulated by state insurance commissioners and by the National Association of Insurance Commissioners. The depth of regulatory accommodation is itself evidence of the depth of deployment.

And it is not, finally, an argument that document AI is the only deeply-embedded AI deployment the trade press has missed. The first corrective made the case for emotional support; the second made the case for industrial visual grading; this one makes the case for document AI; the warehouse vision, the freight routing, the utility-grid optimisation, the semiconductor defect detection — each is a deployment of comparable depth that the trade press also does not cover. The plural form of the claim, made now across three correctives, is that the analyst class has, as a class, a coverage problem of considerable scope, and that the largest and most embedded AI deployments are systematically the ones it is least equipped to see.

The Plain Fact

The structural facts come out, in order, as follows.

The first is that document AI, by any measure of depth — regulatory accreditation across multiple federal and state agencies, capital expenditure across an entire industry, document volume of roughly five billion items a day, labour reorganisation of every major American institution that handles paper, equipment lifetime measured in decades — is the single largest deployment of artificial intelligence in the American information economy in 2026. The deployment is older than the modern internet and has been quietly improving for seventy years.

The second is that this deployment, like the slaughterhouse’s industrial visual grading and the late-night frontier-model emotional support, is invisible to the AI trade press because the AI trade press’s beat ends at the lab door and the consumer app. The rooms in which the deepest deployments actually run — the deposit operations centre, the IRS Submission Processing Centre, the lockbox sorting house, the kill floor, the late-night private chat window — are systematically not the rooms the analyst class visits. The three correctives I have now written — emotional support, industrial visual grading, document AI — are three faces of a single coverage problem, and they are far from exhaustive.

The third is the one I should like to leave plainly. The deepest AI deployments in 2026 are the ones for which the question is this AI? has stopped being asked. The check-imaging pipeline at the bank, the marbling camera on the kill floor, the late-night conversation with ChatGPT — each of these has, in its own way, passed beyond the conversation of what AI is and is not capable of and into the conversation of how the world now works. The trade press is, as a matter of editorial principle, not very interested in the second conversation; the public, as a matter of practical convenience, has stopped asking.

Infrastructure does not announce itself. The deepest AI deployments in 2026 have become infrastructure, and the silence with which they operate is not a failure of communication on their part; it is the natural state of a working system.