January 2024

Our approach to text extraction with ChatGPT

Unlocking the power of machine learning for mass document analysis

At Loomery, we've been exploring the frontiers of machine learning (ML), delving into the capabilities of tools like ChatGPT. Our latest venture? Transforming the way industries handle diverse documents. These new tools and techniques are incredibly useful for when text from a myriad of documents - images, PDFs, handwritten notes - needs to be seamlessly extracted and stored in a consistent JSON format, to be used to power UI or store in a database. This isn't just a thought experiment; it's a reality we're sculpting with ML.

An example for where this will be highly valuable is in the legal sector. This would mean a transformative shift in document analysis. Lawyers often grapple with an avalanche of documents in various formats. Our ML-driven approach streamlines this process, enabling efficient analysis of digital and physical documents, whether they are typed or handwritten.

But the potential doesn't stop there. Think of the vast archives of historical documents, each a piece of the puzzle in understanding our past. Our techniques can digitise and preserve these highly valuable texts, making them more accessible for research and education. In healthcare, medical records - often a mix of digital entries and handwritten notes - can be systematically analysed, enhancing patient care and research.

Educational and research institutions are also set to benefit. Information, as we know, doesn't conform to one standard format. Our approach to these ML tools are adept at sifting through diverse text sources, streamlining the gathering of knowledge and insights.

Loomery has now had a recent client engagement where we got to apply these powerful tools and techniques: in digitising the knitting experience where we will go into more detail on our approach! 

The road to digitising the knitting experience

A client approached us at Loomery with a unique challenge: to revolutionise the knitting world by digitising printed knitting instructions for easy storage on a user’s database. It's akin to ripping songs from CDs to digital music for your iPod with iTunes, but in the realm of knitting. Imagine a service that effortlessly converts text and image-based knitting patterns from physical pages into a sleek digital library. 

The client had had some success already with OCR services but had issues getting reliable output without breaking their UI. We were tasked with making the ML process for text extraction more accurate. 

This task was a perfect match for our expertise at Loomery. We've been vigorously exploring new ML tools, and this project was a prime opportunity to put our skills to the test on a client engagement. The goal was clear - create a seamless digital bridge for knitting enthusiasts, transforming traditional patterns into an easily accessible digital format. This concept, until now, remained uncharted territory, with no straightforward automated solution available. Our journey with the client marked the beginning of an innovative chapter in bringing the cosy, intricate world of knitting into the digital age. 

Streamlining text conversion: Azure & GPT-4 Turbo in action

Over the course of a 3-day investigation, we delved into the capabilities of GPT-4 with vision (GPT-4V) to extract text from PDFs and images. However, we encountered a snag: GPT-4V, though revolutionary, wasn't quite hitting the mark. It struggled with complex layouts featuring multiple columns and interspersed images, often abbreviating or omitting crucial details. This sparked our quest for a more robust solution.

Our goal was straightforward: reliably extract text from any document or image, regardless of its layout. We faced a challenge with ChatGPT's inherent randomness in generating responses. This unpredictability, though a charm in many scenarios, wasn't suitable for our task where precision and consistency were key. Even tweaking the "temperature" setting to stabilise outputs came with drawbacks. GPT-4V's reasoning with text across various formats was inconsistent, and it lacked native PDF support, adding an extra step of converting PDFs to images.

The tech stack we were opting for. Our key advice was to integrate Azure DI into the process rather than solely relying on GPT-4V to perform the whole process.

Determined to leverage the power of GPT-4, we turned to Microsoft Azure's AI services, built on the same technology. Azure's document intelligence tool, particularly its latest v4 model, emerged as a step change from previous models. It excelled in extracting raw text from any PDF or image with a higher degree of accuracy.

One of the major challenges the client faced and asked us to overcome was parsing multiple columns on more complex layouts like this, Azure Document Intelligence v4 has no problem with this, ordering the extracted paragraphs correctly column by column, word by word.

The raw output from Azure DI, much more reliable than GPT-4V at ripping text verbatim without any abbreviations. Azure DI also puts the text in a format that is very easy for other AI tools to parse should you wish to process this text further, which we do with GPT-4 Turbo to get it into a JSON format.

The final piece of our solution involved using the raw GPT-4 Turbo text-based model. Once Azure extracted the text, GPT-4 Turbo stepped in to format it into JSON with a bit of old fashioned prompt engineering. This allowed seamless integration into their web app's UI, displaying instructions clearly and efficiently. In summary, by combining Azure's precision in text extraction with GPT-4 Turbo's formatting prowess, we achieved a reliable and effective text extraction process suitable for diverse document layouts.

An example of the JSON returned after piping the Azure DI output through GPT-4 Turbo to format the text into a consistent, structured JSON making it easy to be stored off in a database and/or power UI.

Expanding horizons: A diverse range of use cases

Our exploration into text extraction is not just limited to PDFs and images. The methodologies we've developed at Loomery have broader applications, capable of deciphering a wide spectrum of sources, including the often challenging realm of handwritten documents. Let's illuminate some potential scenarios where our techniques could be transformative:

Legal and financial document processing

Consider the legal industry, where contracts and case files often exist in a blend of typed and handwritten formats. Applying our techniques could streamline the analysis and organisation of these critical documents. The same could be said for the financial sector, awash with bank statements, invoices, and receipts, often a mix of digital and handwritten entries.

Archival digitisation

Imagine historical documents, their hand-scribed contents fading into obscurity. Our approach could breathe digital life into these relics, making centuries of knowledge searchable and accessible.

Medical record analysis

Picture a hospital with stacks of patient notes, a mix of printed forms and doctor's handwritten comments. Our text extraction methods can help digitise these records, paving the way for more efficient and accurate patient care management.

Educational resource compilation

Think of schools where educators compile varied teaching materials, including handwritten notes. Our solution could assist in creating a unified, digital repository of educational resources.

Library cataloguing systems

Libraries, repositories of printed and handwritten materials, can benefit immensely. Our methods could digitise rare manuscripts or catalogue handwritten annotations, enriching the digital library experience.

Research data compilation

For researchers handling a plethora of data sources, from field notes to journal articles, our solution can aggregate and digitise this diverse information, facilitating easier analysis and sharing.

Real estate documentation

The real estate sector deals with a myriad of documents like property deeds, agreements, and inspection reports, which are often a blend of typed and handwritten content. Implementing our text extraction solution can dramatically enhance the processing and storage of these crucial documents.

These examples underscore our belief at Loomery: technology should adapt to human needs, not the other way around. By extending our text extraction capabilities to these diverse areas, we're not just solving technical challenges; we're unlocking new opportunities for efficiency, knowledge, and growth.

Score your team against the 8Cs

Sign up below to receive a worksheet to score your team against the 8Cs, and a guide to some smart next steps based on where you score lowest.

For information on how we use your contact data, please read our Privacy Notice.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Get the latest news and views from Loomery directly to your inbox
Stay ahead of the curve with our monthly newsletter, The Weave.
Discover more insights