The Final Frontier: Turning Documents into Data

January 4, 2017, by Sid Probstein, CTO & VP Professional Services, AI Foundry  

I recently applied for a home mortgage. The process could not have been simpler; I was pre-approved in a day or two based on the application and a quick initial consultation. The next day I began getting emails from an associate at the bank. His instructions were crystal clear: “print these out, sign them, take a picture with your phone and email them back to me.”  It was like having a concierge. Or so it seemed.

A week later, I was scheduled to go on vacation in Europe and did not want to worry about my mortgage application. I called the bank associate first thing Friday morning. The anxiety my request produced was palpable. He said they would call back shortly; the branch manager reached me at 4:30 pm sharp. “We think we have everything we need," the branch manager said. It was not reassuring. I asked a couple of questions and quickly got to the bottom of the problem. There’s no system in place to track documents; the system is what the documents are fed into. “Everything is spread across five desks, in verification right now,” the manager said. “But we think we have everything we need.”

Everything went fine in the end. But I started thinking of this concierge-like process from the perspective of the bank. Where did the associate save my files? What if their laptop died, or was stolen? How many loans are waiting to get into the system at any given time? And on average, how long does that process take?

Answering the Questions
There is an old axiom, often most attributed to Peter Drucker, that said you can’t manage what you can’t measure. Most businesses have invested millions in IT over the past decades; most of it is for dashboards and reports that provide a historical view of activity – it summarizes the products & services sold. 

The bank system, from the example, knows how many loans it is processing, and how many it has approved in the past month. What it doesn’t know is how many forms had to be re-requested or re-keyed before entry; how many U.S. mail messages had to be sent to request social security numbers or other critical PII or how many trips to the bank the customer made. The big question for banks is how much effort are customers going to through if your process isn’t relatively simple and easy? 

For processes that depend on documents, such as loans, mortgages, claim processing, insurance applications and numerous other financial services products, as well as regulated industries like manufacturing, health care and life sciences, turning documents into data is the only way to measure, manage and ultimately optimize the process and customer expereience (CX).

Consider these examples from across industries:

  • Insurance applications sold at kiosks or by contractors, are primarily faxed to insurers; all faxed data is manually keyed into application systems – for security reasons
  • Claims processors rely on USPS or fax for photos and documents related to claims and end up facing the insured in court who is generally armed with higher-quality smart phone photos & videos   
  • Manufacturers store data for 30+ years to ensure compliance with various regulationsl largely because it is too difficult to manually determine what data should be retained and what can be safely discarded
  • Migration to Cloud and between SaaS and cloud providers, is fraught with risk as scans and images, to say nothing of digital documents, may contain personal information that could expose the company to liability
  • Critical documents are mis-filed, lost, incorrectly keyed, labelled, re-created or otherwise manually managed some 43% of the time (across all industries)

New Technologies Beyond OCR
Turning documents into data allows these issues to be resolved. By using new technologies like visual classification & extraction, it is possible to operate at scale, even processing huge backlogs, without excessive human intervention, at reasonable time and cost.

Use Case: Loan Origination/Secure Collaboration

Visual classification is a breakthrough technology that organizes documents the way humans do when they first look at something: by the way it looks. The enterprise is awash in forms and templates as well as important consumer documents. Digitizing and extracting relevant data from these using a combination of human-directed and machine learning makes it possible to report on their state; route them into business processes like LOS, POS or other application management or customer onboarding systems, and track every document along the way.


More than 100 types of image, and 600+ types of document formats, are automatically recognized and categorized. Training the visual classifier to handle other formats takes a few hours, given a sample of documents. Extraction is similar, combining visual content analysis and multi-OCR voting with sophisticated image separation and region detection, it takes roughly one hour per field. For moderate quality scans (150 dpi or higher) extraction rates on printed text are above 99%, and can be similarly superb on hand printed text, especially for numbers, checkboxes, etc.

Why Now for Turning Documents into Data?
Not every business will find an ROI based on storage costs. In many cases, storage of the original is required for regulatory purposes. A far more compelling reason is the user experience. Go to the corner of a nice downtown area in the city nearest you. Chances are you’ll see a half-dozen banks offering services. The same is true across most industries: insurance, pharmaceutical financial services, automotive, transportation, you name it, there is robust competition everywhere. And if you can’t gain margin to pass along to the customer by cutting storage costs, then maybe you can compete on experience. Because having a great customer experience means alot to the bottom line. 

Imagine, if during my mortgage application process, I had been able to use an app provided by the bank and as I uploaded each document the app displayed a big green check mark to show that the document was received and correctly processed. Within a few minutes of sending everything required, with good enough quality photo, I should be able to know that everything has been provided. Over the next few days I should get further updates, verifying that all information is correct, and ultimately that my application has been passed to underwriting successfully. 

And here’s the challenge. Building a great digital experience is hard. The competition is not the bank across the street. It’s Amazon, Dominos, Yelp, Google, etc, all showing how easy things can be. Banks should fear the day one that a non-banking enterprise decides to enter the banking industry because their disruptive approach will eliminate much of the current landscape.  

But those companies don’t know how to find customers, let alone underwrite their loans. Those aren’t simple things to do, either. Rather than trying to beat them at their own game, by focusing on making the concierge-like process I described earlier into a managed process. Because it’s not a bad one, from the end-user perspective. It’s only problematic when I need status back. Turning documents into data provides a great CX simply by making that status reporting possible. Even if you only made it available to the bank managers and associates, it would be an improvement from the current chaos of the loan application process. 

Request 1:1 Demo from AI Foundry

Contact Sid Probstein

Email: [email protected] 
Connect on LinkedIn 
Follow on Twitter @sidprobstein