With AI becoming increasingly popular in recent years, many are wondering whether it is over-hyped and over-used. While time and the market will eventually sort the hype from the lasting value, we will continue to pragmatically build AI products that solve real world problems that would otherwise be unapproachable. To aid the reading of this text, along with our AI development framework, I will also present a concrete example of how it was applied in practice to build an AI-based invoice scanner. This is the first blog post in our AI blog series where we’ll also cover topics such as Data Mining, Data Engineering and Process Optimisation.
What are we solving? Whenever we consider using AI in a solution, we start by clearly defining the core problem that we are solving and its technical restrictions. As it is the case for any well-posed problem, we must define the input and the output - here the input is an invoice as a file (pdf or image), and the output is a domain specific key-value mapping (e.g. customer email, biller bank details). Multiple functions may satisfy this requirement, and we will evaluate and rank them using business-specific metrics: speed, cost and accuracy.
When should we use AI? As stated above, we should use AI whenever we want to solve an unapproachable problem. Although this sounds like common sense, there are plenty of cases of LLMs being used instead of a regular expression, and vision models replacing basic image processing snippets. While AI solutions can be reliable, a 99% accuracy leaves room for surprises - deterministic, static solutions are often more reliable and maintainable if they meet the requirements. Scanning invoices is hard - before the current generation of AI models, entire start-ups were founded just to solve this single problem; classifying this invoice ingestion problem as too difficult for a purely engineering solution.
What AI model should we use? At the moment there are a significant number of AI model providers (including open source models), each offering multiple models with different trade-offs in speed, cost and accuracy. Selecting a single model to use in a product is not trivial. Of course, public benchmarks and ELO tables may guide our decision, but the no free lunch theorem tells us that doing so may trick us into using a sub-optimal solution for our particular problem. Our solution to this model selection meta-problem is to develop problem-specific automatic evaluation benchmarking tools. These allow us to empirically measure speed, cost and accuracy of each model for our specific use case.
The choice of a single model is still not straightforward - the output of multi-objective optimisation algorithms is a front of non-dominated solutions, whereas we would require a clear ranking of the solutions. In such cases, the user preference decides the final solution rankings. For the invoice scanner, the business acceptance criteria defined the user preference rules: the ingestion cost per invoice must be below X cents, the ingestion time must be below Y seconds, rank the remaining models by accuracy.
Build around the AI core Similarly to the famous expression of Mathematics authors “the proof is trivial and it is left as an exercise to the reader ”, an AI Engineer might state that the AI core is validated and the rest of the software product is left to the Software Engineering team. Given the integration complexities usually involved, both statements can only be jocular phrases. To be clear, at Adfin, an AI Engineer’s role does not end with the proof of concept; it extends to developing products from the initial idea all the way to our clients.
AI integration comes with unique challenges beyond standard software development concerns. Let’s discuss them one by one:
Model availability : Outages from AI providers are out of our hands - but our reaction is not. We build AI products that support multiple providers, making it easy to create fallback mechanisms that span across different providers.Surprising responses : the entire point of using an AI component is to solve difficult problems, translating into highly variable inputs that may lead to unexpected responses. A quite well known example is that of making slight changes to traffic signs in order to trick the vision model of self driving cars . In the absence of ground truth, we log all responses and improve iteratively.Privacy concerns : financial data is particularly sensible and it must be handled with great care. When building AI products, Adfin only considers model providers that comply with strict privacy and data governance standards.Scalability: all LLM providers impose maximum concurrency limits, generating scalability bottlenecks. Our solution is an extension of the fallback mechanism: whenever we approach the concurrency limit of one provider, consider routing the following requests to another one - again, in the order of our preference.User experience: AI takes time. While the model runs, the user interface should reflect that (e.g., show a spinning “in progress” icon or a progress bar). If a response takes too long, we notify users via email to keep them in the loop even after the session ends.AI can be transformative when applied thoughtfully to hard problems. At Adfin, we view AI as part of a broader product - where business needs, user experience and system reliability are all part of the design. Our invoice scanner demonstrates the impact of this approach: compared to our earlier version built with OCR and text-only models, the new scanner - developed using our AI framework - processes invoices 3x faster, at 1/20th of the cost, and has a reduced error rate from 20% to just 1%.