Introduction
Hewlett Packard has a large number of legacy customers who still place orders using Purchase Orders, which are manually processed by the team responsible for order processing. To aid this team, the Automation CoE team developed a human-in-loop robotic process automation system to automate the order processing. In this project, we aimed to further enhance the RPA Pipeline by introducing invoice extraction using machine learning models.
To comply with my non-disclosure agreement, I have omitted and obfuscated confidential information, including names of teammates, and can only provide a brief, high-level overview of the project.
Key Skills
- User Interviews
- RPA & ML Development
- Core Java
- Groovy, Python
- WorkFusion Studio
Team
Project Manager
Solutions Architect
ML Developers (including me)
Duration
6 Months
Context
About the existing RPA Pipeline referred to here as “Automation Program”:
The Issue
Global Order Processing team process 700K+ orders ever year, out of which 40% are manual orders that come through generic mail boxes or other stand-alone applications as PDF Purchase Order invoices, resulting in human intervention, which in turn impacts efficiency.
The Solution
The Automation Program is a strategic initiative to standardize and enhance order experience to reduce manual efforts, increase process efficiency, and accelerate E2E order cycle time, by connecting all data to a single platform, to reduce fallouts and add value.
Challenges to the existing system
- The existing invoice extraction process is manual and time-consuming, leading to delays, potential errors and losses for the company.
- The invoices are in varied formats, hence just standard Optical Character Recognition for extraction is not effective.
- Multi-lingual invoices require users that can understand the language.
- The current RPA Order Processing pipeline is not designed to handle the complexity and variability of invoices, making it difficult to automate the extraction of relevant information.
Solutions
Implement a machine learning model for invoice extraction, using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques.
Roughly followed the HCAI Design process guideline as laid out by Google:
Empathise
Conducted multiple discussions and interviews with the operations and business teams to understand the pain points and challenges of the users in the Orders Processing Operations Team, and clear requirements from the Business team. This helped to define the scope of intelligent automation for the project. Some of the key questions we addressed include:
- What are the challenges faced by users while processing the orders?
- How many customers are attaching Purchase Orders as PDFs?
- What are the volumes of these orders?
- What more outcomes does the Business need from the Order Processing RPA bot?
Define
Problem Statement
Handle Data Insights : Dataset collection, cleaning and preprocessing of available Purchase Orders.
Ideate
Brainstormed and Evaluated various methods for invoice extraction including OCR, NLP and PDFBox.
Given the hundreds of purchase order samples of various formats, we concluded that the most suitable option would be to use a combination of OCR and an ML Model for each language.
Prototype & Test
Build Model
ML Model Training:
Over 150 Purchase Orders of each format were tagged by the Subject Matter Experts on the operations team with the appropriate fields to be extracted by the model. This tagged data is then used to train the model.
Extracted information is compared with gold data to define accuracy of model.
We were able to achieve 90 - 97% Accuracy for all models of various formats.
Deploy Model
Model was integrated into the existing RPA Pipeline, by introducing two new processes into it, i.e OCR Bot and ML Extraction Bot.
Bot Process:
Human-in-the-loop, “Workspace”
Designed a platform using WorkFusion, in which the order management team members can review, re-tag and enhance the invoice extraction.
If tagging is required, invoice extraction data mapping is not as expected, or extraction fails, a human can intervene in this step.
Information as tagged by the human is used for retraining, and the model is retrained in intervals, to ensure continually evolving ML Model with increased accuracy.
Results & Conclusion
The implementation of the machine learning model for invoice extraction significantly reduced the manual effort required for the process, improving accuracy and efficiency. It also helped to accelerate the order processing cycle time. Overall, the Automation Program was successful in achieving its goals of standardizing and improving the order experience.