Why OCR when the data is in front of you? An interesting question, the likely answer is that you didn’t know such data existed. Picture the outdated accounting department of the 90s - stacks of invoices piled on desks, the printer whirring continuously and huge filing cabinets at bursting point. If we asked the workers what they wanted to change, what would improve the problems within the department, what would they say? “A bigger filing cabinet” or “a more efficient printer/scanner.” It’s unlikely that anyone would have the foresight to suggest sending all invoices electronically and processing without printing or scanning.
The paperless office has been promised for almost 20 years – but, unfortunately, it is yet to become a reality. In fact, 80% of all invoicing in the U.S. is still paper-based, so its clear something is wrong. It’s clear to me that the biggest obstacle to e-invoicing and e-document processing is getting suppliers to adopt. That is why I believe e-invoicing and e-document processing technology should be free for suppliers, easy to use and non-disruptive, as only then will suppliers move away from paper.
There is no doubt that the business and environmental impact of practices such as paper invoicing and paper purchase orders have rendered them unsustainable and hugely unproductive; and that e-invoicing and e-documents are a modern, innovative solution to the paper problem.
Optical character recognition (OCR) offers one way of enabling a computer to interpret a human-readable document and to extract data from it. Originally, we had document scanning and archiving; then OCR technology allowed us to extract data from scanned images of structured and semi-structured documents. Capture evolved into intelligent capture with the introduction of so called ‘learning algorithms’, while vendors started to evaluate themselves and the competition in terms of recognition and straight through processing rates.
But is OCR still relevant today? Do we really need to OCR – and incur the mistakes that OCR makes - when the data is carried within most documents received into an organisation? As I said, it is highly likely answer is that you didn’t know such data existed. Even with various technological advances and fierce competition among industry leaders, scanning and OCR is – and will always be – a flawed approach to the conversion of 'human readable documents' into 'machine readable structures':
- Scanning: First, paper needs to be converted into a format that can be processed by an OCR platform. A manually intensive process: mail is received, opened and sorted. Staples are removed and batches prepared. Paper documents are scanned and original documents either archived or destroyed.
- OCR: The scanning function feeds the OCR platform which reads the photographed image of the document, attempting to convert the pixels on the photograph, into meaningful characters. OCR companies’ often boast about high recognition rates – but the variables impacting success are often outside of their control: poor quality paper used by the supplier; the way data is laid out on the document; the quality of the scanner used and even the way the paper has been folded in the envelope, can all impact on OCR results and make the most powerful platforms close to useless.
- Quality control: Irrespective of how good the image is, you still need an operator to check OCR results and either correct what’s been captured or fill in what’s missing.
So, where is the data? Most organizations now send and receive PDF documents via email. It is the easiest and most efficient way to send documents, such as invoices and orders, as the functionality is ‘out of the box’ with modern billing and procurement applications.
There is no question that email and PDF is ubiquitous. However, what many may not be aware of is that when an application generates a PDF, in almost all instances, the data – such as invoice number, line quantity, and amounts – will be embedded within the PDF, put there by the generating application. This type of nethod guarantees data quality and removes the manual activities and risks associated with scanning and OCR. Now, you know where the data is stored, you can automatically map this data to an e-document structure that’s compatible with your processing application.
As this approach is so simple and non-disruptive to any supply chain, adoption rates are extremely high when an organization promotes this method of e-invoicing. So again… why OCR? Of course, it would be fool hardy to predict that there will ever be a truly paperless office. Some paper will likely remain – at least in the short term. However, since most billing applications can generate and send PDF invoices via email, it is the easiest and quickest way to move closer to a paperless office.