Meet Harshita.
Meet Harshita.
Invoice Extraction with ML & RPA for enhanced order processing
📃

Invoice Extraction with ML & RPA for enhanced order processing

Introduction

Hewlett Packard has a large number of legacy customers who still place orders using Purchase Orders, which are manually processed by the team responsible for order processing. To aid this team, the Automation CoE team developed a human-in-loop robotic process automation system to automate the order processing. In this project, we aimed to further enhance the RPA Pipeline by introducing invoice extraction using machine learning models.

To comply with my non-disclosure agreement, I have omitted and obfuscated confidential information, including names of teammates, and can only provide a brief, high-level overview of the project.

image

Key Skills

  • User Interviews
  • RPA & ML Development
  • Core Java
  • Groovy, Python
  • WorkFusion Studio

Team

Project Manager

Solutions Architect

ML Developers (including me)

Duration

6 Months

Context

About the existing RPA Pipeline referred to here as “Automation Program”:

The Issue

Global Order Processing team process 700K+ orders ever year, out of which 40% are manual orders that come through generic mail boxes or other stand-alone applications as PDF Purchase Order invoices, resulting in human intervention, which in turn impacts efficiency.

The Solution

The Automation Program is a strategic initiative to standardize and enhance order experience to reduce manual efforts, increase process efficiency, and accelerate E2E order cycle time, by connecting all data to a single platform, to reduce fallouts and add value.

Challenges to the existing system

  • The existing invoice extraction process is manual and time-consuming, leading to delays, potential errors and losses for the company.
  • The invoices are in varied formats, hence just standard Optical Character Recognition for extraction is not effective.
  • Multi-lingual invoices require users that can understand the language.
  • The current RPA Order Processing pipeline is not designed to handle the complexity and variability of invoices, making it difficult to automate the extraction of relevant information.

Solutions

Implement a machine learning model for invoice extraction, using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques.

Roughly followed the HCAI Design process guideline as laid out by Google:

                                                          From Google People+AI Research Guideline
From Google People+AI Research Guideline

Empathise

Conducted multiple discussions and interviews with the operations and business teams to understand the pain points and challenges of the users in the Orders Processing Operations Team, and clear requirements from the Business team. This helped to define the scope of intelligent automation for the project. Some of the key questions we addressed include:

  • What are the challenges faced by users while processing the orders?
  • How many customers are attaching Purchase Orders as PDFs?
  • What are the volumes of these orders?
  • What more outcomes does the Business need from the Order Processing RPA bot?

Define

Problem Statement

📌
“To use digital transformation techniques to improve the accuracy and efficiency of the invoice extraction process, and integrate it into the existing RPA pipeline.”

Handle Data Insights : Dataset collection, cleaning and preprocessing of available Purchase Orders.

Ideate

Brainstormed and Evaluated various methods for invoice extraction including OCR, NLP and PDFBox.

Given the hundreds of purchase order samples of various formats, we concluded that the most suitable option would be to use a combination of OCR and an ML Model for each language.

Prototype & Test

Build Model

ML Model Training:

Over 150 Purchase Orders of each format were tagged by the Subject Matter Experts on the operations team with the appropriate fields to be extracted by the model. This tagged data is then used to train the model.

image

Extracted information is compared with gold data to define accuracy of model.

We were able to achieve 90 - 97% Accuracy for all models of various formats.

Deploy Model

Model was integrated into the existing RPA Pipeline, by introducing two new processes into it, i.e OCR Bot and ML Extraction Bot.

Bot Process:

image

Human-in-the-loop, “Workspace”

Designed a platform using WorkFusion, in which the order management team members can review, re-tag and enhance the invoice extraction.

image

If tagging is required, invoice extraction data mapping is not as expected, or extraction fails, a human can intervene in this step.

Information as tagged by the human is used for retraining, and the model is retrained in intervals, to ensure continually evolving ML Model with increased accuracy.

Results & Conclusion

The implementation of the machine learning model for invoice extraction significantly reduced the manual effort required for the process, improving accuracy and efficiency. It also helped to accelerate the order processing cycle time. Overall, the Automation Program was successful in achieving its goals of standardizing and improving the order experience.

Logo

©harshitashyale

LinkedInXGitHub