AI comes of age on language

Dr Sandy Nairn

Executive Director

Global Opportunities Trust plc

A dialogue with co-founders of Malted AI

Introduction

AI has been dominating the investment landscape in recent months as tools such as ChatGPT introduce AI’s capabilities to the general public. Meanwhile, the press coverage ranges from the apocalyptic (end to the world) to the messianic (a coming productivity and innovation miracle which will save the world). Capital has certainly been flowing to AI with private investment estimated at approximately $120bn last year with the US accounting for about two thirds of the total. So far as public capital is concerned the numbers are even more striking. One just has to consider the sales volume growth and accompanying market capitalisation expansion of Nvidia the prime supplier of GPU chips for AI applications.

Against this backdrop, it should not be surprising that few corporate announcements fail to mention AI and it would be impossible to believe there is anyone unaware of the coming wave. However, when one meets with corporates or reads most of the press, what is striking is the lack of precision in the discussion. AI is commonly referred to in generic terms despite it being both multi-faceted and at various stages of development depending upon which facet one is referring to. This is natural for an emergent technology but to understand the potential ramifications much more granularity is required. As a starting point, it is helpful to spend some time with people who are at the leading edge of recent developments and try to explore some of the obvious questions with them.

This paper focuses on a simple practical example, but one at the centre of AI development. In 2022 the team from Glasgow University won the prestigious Amazon Alexa Prize Taskbot Challenge. Three members of this team recently launched Malted AI, a new company to develop the technology for the corporate sector.

The Team

Malted AI’s founders, Iain Mackie, Carlos Gemmel and Federico Rossetto, are leading PhDs with demonstrated expertise in applying AI to real-world information problems. They came together four years ago while researching large language models (LLMs) under Dr Jeff Dalton, the University’s Alan Turing Fellow. They bring complementary commercial, machine learning and software skillsets. The founders recently won the multi- million dollar Amazon Alexa Prize Challenge, using the technology that inspired what is Malted AI today, beating 125 of the leading AI teams from around the globe and being recognised as a Sequoia FF100 ‘Rising Star’. Headquartered in Scotland, it is the team’s ambition to become a global leader in AI.

The Company

Malted AI provides enterprises with bespoke AI applications combining small language models (SLMs) with high-quality data from its distillation technology. Our AI models become experts at domain-specific problems, which results in high factual accuracy and models that are 10 to 100 times smaller compared to large language models. Malted AI’s vision is to create significant value within enterprise environments by addressing complex problems that general AI cannot solve due to its generic approach, lack of high quality data and high computing costs.

(More detail on Malted AI is set out in the Appendix at the end of this report.)

Explaining the jargon

Large language models (LLMs), small language models (SLMs), distillation and synthetic data are key terms used as short-hand to describe what you do, but their meaning is not intuitively obvious. For the uninitiated, would you mind giving a layman’s explanation?

LLMs and SLMs

Both large language models (LLMs) and pv7small language models (SLMs) are machine learning models that have shown advanced capabilities to understand, generate, and manipulate language in a human-like way. These models are trained using large quantities of data to acquire language patterns thought statistical relationships.
Given an input, for example a question, a language model is capable of composing an answer (output) that predicts which words go after the other, based on the number of associations perceived in the data that is used to train the model.
Each model is defined by its parameters, which affect its size, complexity and ability to handle data. Larger models, with more parameters, can capture more intricate patterns and statistical properties within the data they process. This doesn’t necessarily mean that bigger models handle a greater volume of data. Rather, they excel in modelling the underlying statistical distributions and complexities of the data more accurately. However, when thinking about how AI will provide value to business, Large Language Models have an important number of limitations.
Without going into much detail about parameters, at its a complex matter, one simple issue is the ‘size’ of the AI model. For example, LLMs often have more than 100 billion, or even 1 trillion parameters, while an SLM could have between 100 million to 10 billion.
The number of parameters affects the computing power that the model needs to work. This is a key reason why running and maintaining Large Language Models is expensive, making it unaffordable for certain uses. LLMs usually require expensive clusters of GPUs (costing millions of dollars a year), while an SLM could be ran using a single GPU (from under £1,000 per month). SLMs are cost-effective, LLMs cost-intensive.
Additionally, LLMs often exhibit higher latency (communication delays) in generating text compared to smaller models. The slower processing time can make LLMs less convenient for deployment, as they require more time to produce results (or make applications at scale impossible).
The size of a LLM also affects the performance of the model in bespoke applications. Large Language Models have been developed with the idea of ‘holding all the knowledge in the world’. If you think about getting domain specific answers, the generalisation of a Large Language Model is not going to give you an accurate answer as it has not been trained with the type of information you need for a desired answer.
In other words, LLMs are generalists, while SLMs are experts.

Distillation

To understand distillation, is important to provide some context about how machine learning models are developed.

One of the beauties of artificial intelligence is the way it combines mathematics with words. If we were to be very simplistic, we could say that the underlying secret of how machine learning works is statistics. That is why the data (input) used to train a model will determine the quality of the output. For machine learning to work you need ‘data, data, and more data’.
Given the importance of data in a model, the quality of such data is paramount. As we are teaching machines how to think like humans, we need to give them data that will contain the information needed for whatever we want from the AI, as well as the answers we would expect.
This process of labelling raw data to provide a machine with significance and context is called data annotation. In traditional machine learning these annotations happen manually, making the process slow, inefficient and expensive.
Here is where distillation comes in. Instead of manually annotating vast amounts of data to train a model, we annotate a sample of the data, which is then used to train a teacher LLM. This powerful model is capable of creating new data points from the information provided, or what we refer to as synthetic data. When we refer to distillation, it is the process of using an LLM to generate synthetic data, which is then used to train a smaller, task-optimised SLM. This process effectively transfers knowledge from the ‘teacher’ (the LLM) to the ‘student’ (the SLM). In other words, we are using generic AI to build domain-specific AI without the need for thousands of human hours spent on labelling high quality data to train a model. Distillation is an innovative technology designed to make the most of a powerful LLM, by enabling it to train specialised SLMs and ensuring that the data utilised is of the highest quality and at the lowest cost.

Perhaps we could give an example to illustrated distillation?

An example of distillation is the teacher-student paradigm. We start a very large LLM which has been pre-trained on web-scale data. These models are capable of performing many tasks from an in-depth explanation of the problem. This is generally referred to as “prompting”.

From this large teacher model, we can generate large quantities of synthetic data that is then used to train a smaller language model (SLM) on the specific task. This can be seen as “teaching by example”, as if the teacher is sitting next to the student and showing how to solve the problem at hand.

Synthetic Data

Humans have previously been the main “generator” of data (Google click data, Wikipedia, website contents, etc.). When we refer to synthetic data, we mean a non-human process (i.e. machine learning model and/or a template) that is used to create a dataset without manual annotators.
With LLMs becoming highly capable in language tasks, they can generate synthetic data more quickly and more cheaply than expensive humans. For example: Google trains its search engine from user queries (what you enter into the search bar) and click data (what you click on after you see the results of the initial query). With LLMs, we could generate synthetic queries and create automatic mappings between queries and web pages.

Digging Deeper

That is helpful thanks, but it does raise a few questions which might appear a matter of semantics, but can lead to confusion if the meaning is not clarified.

When you refer to the ability of SLMs to ‘understand, generate and manipulate language’, the last two seem clear, but it might be helpful to spend some more time on what is meant by’ understand’.
Could ‘understand’ be replaced by ‘recognise’? It seems that AI builds up a picture of meaning and context through repeated examination of examples, which allows the progressive identification of letters through to words, phrases and sentences etc.
Recognition of the appropriateness of context then allows generation of language. It is in this sense that to ‘understand’ is used?

Yes. ’Recognise’ is a good word to explain what SLMs are doing. They build ever more complex pattern recognition based on the data on which they are trained. Naturally, the complexity and contents of the training data has a strong influence on the types of patterns it is able to recognise. These patterns help the model ‘predict’ which words come after the other.

This brings up another frequently used term ‘domain expertise’, again, could you just elaborate a little bit here?

As humans, we learn to become “experts” at various tasks, i.e., lawyers learn the law and train to work at a legal firm. As they become experts they acquire knowledge and learn complex workflows specific to their trade to accomplish these tasks, i.e., knowing where to find relevant case laws and how to use the information contained, etc.
To get SLMs to accomplish these more specialist tasks requires equivalent “training” and breaking down of these complex workflows . In a multi-step workflow, each student SLM maybe tasked with a very specific step for which the student has been trained.
At Malted AI we develop a multi-step process by which multiple data sources and student SLMs may be combined to solve a problem. This is deployed to achieve a very specific task that often mimics what a human would do, for example: edit a legal contract step-by-step, according to the following instructions.
Malted AI works with domain experts to build specialised systems that can accomplish specific tasks. In effect we aim to leverage the capabilities of multiple SLMs to build custom pipelines

So just to try and summarise here, the ‘magic button’ interpretation of an AI SLM is deeply flawed i.e. one cannot simply pose a question and expect AI to retrieve, analyse and report back?

For AI to operate and produce commercially useful information, the SLM requires specific topic-related knowledge to inform its design?

As any other technology, the successful deployment of AI depends heavily on expert understanding of the problem at hand and avoiding using the wrong tool for the job. Since AI has many layers to its design (the choice of data to train on, an algorithm to model that data, infrastructure to house the trained algorithm), it requires skill in both the AI stack and the target domain to successfully have return on investment. If you use a hammer to glue two bits of wood together, it’s not going to work.
Failure to do this may lead to unknown failure outcomes where people believe something is working, but actually it isn’t. This is particularly true with Language Models as they are very convincing at creating things that “seem” right, but actually aren’t.
As the quality of the data and the way the model is trained will determine the output of the AI, it is also important to specify the desired outcome and train the model with that in mind. An SLM will only be as good as the way it has been trained. If one needs highly accurate and domain specific outcomes from a model, that should be included in its design.

It is my impression that although almost all companies want to embrace the potential of AI, it is a reasonably small subset who have a clear, defined view of what they want to do. In other words, most companies are on their own learning curve and still at the more formative stages in understanding what the technology can bring and, perhaps more importantly, what they need to do to allow it to be applied?

Since the release of ChatGPT, business leaders have been under increasing pressure to adopt AI to find cost-reduction or revenue-generation opportunities. Most enterprises seem to be in the testing and proof of concept stage rather than achieving large commercial gains.

There is a big learning curve within businesses to understand what it involves to adopt AI. There are also misconceptions about how AI is developed and what resources are needed to make it work in an effective way. Implementing AI involves more than just having a fancy language model. How it runs, scales and is maintained will determine its sustainability and usefulness over time.

The costs involved can easily be overlooked.

Access to high quality data is another point that requires more education. One of the biggest challenges of adopting AI, if it is not understood well, is that it can get expensive very quickly. Return on investment calculations need to be carefully analysed.

Adopting AI also requires organisational cultural change as it is not something that can be implemented in isolation. It needs to be embedded in a company’s operations, meaning it will affect the way people are used to work.

At Malted AI, we work with businesses primarily in regulated industries, such as banking, asset management, legal, medical, and more. There are already immediate gains from using the recently developed capabilities of SLMs to advance simple tasks at scale.

Our experience is that business leaders are most likely to be successful if the following apply:

There is an internal “understanding” the opportunities and limitations of Language Models;
Investments must have both economic benefits and be technically possible;
They look creatively for inefficiencies in their business that lie beyond the most obvious and visible uses of AI (i.e. chat bots, digital assistants)
They should focus on tasks with reasonably predictable positive returns rather than expecting to achieve perfect outcomes with 100% accuracy;
They should ensure they have the right business processes and use the best tools for the job, as well as good technical products.

It feels like some areas are ripe for the use of AI. They tend to be sectors where the data are relatively well defined and where historic archives are readily available.

For example, a collaboration between CodeX–The Stanford Center for Legal Informatics and the legal technology company Casetext enabled GPT-4, the latest LLM, to pass the Uniform Bar Exam achieving marks which would have been placed in the top 10% of students.

In arenas where the use of precedent is key for diagnosis, whether that be legal or medical, it is not hard to see how AI can provide valuable assistance in the future?

Yes, areas that are based on language data (legal, professional services, software development, etc.) are more inherently suitable for SLM integration.
The way machine learning models are trained they generally learn to “pick up patterns”, or at one extreme “memorise”, what they have seen during training.
SLMs are most likely to disrupt tasks that are repetitive and can be well defined. SLMs enable these tasks to be achieved at scale with low cost.
We believe many applications of AI will require a human-in-the-loop. This means we entrust the SLM to do the “busy work” while the human ensures its correctness.
Broadly speaking, any field that requires “busy work” involving text, images, or structured data is ripe for AI to play a role in.
As AI models and SLMs continue to improve, they become capable of handling more complex tasks at scale.
Humans will become more productive as a result and you are already seeing this in the arts with AI image generation (Midjourney, etc.), how people use AI in the loop of professional writing (Writer, ChatGPT), and support software development (Replit, Microsoft Copilot).
However, we need to remember that the Bar Exam is a text-based evaluation designed for humans. Although impressive, these results do not mean that an LLM like GPT-4 would be able to completely substitute the role of a lawyer. Businesses need to understand how these models can effectively and safely be integrated into their workflows to realise value.

In terms of prospective clients where do you see the greatest opportunity and are there any obvious areas where client knowledge is a sufficiently advanced level to allow immediate implementation?

Or is it effectively on a case by case basis depending largely on the culture and individuals involved?

Companies need people that understand the opportunities and limitations of AI in order to develop high-value and safe outcomes.
We believe the best solutions require strong collaboration between domain experts who deeply understand the area and AI experts who understand the technological realities.
Given the complexity of deploying and managing AI solutions, we believe there needs to be alignment of both people and organisations.
For example, companies have a lot of internal know-how that is usually stored inside documents but is very hard to circulate. AI can allow you to access this knowledge in a natural way, finding and summarising/extracting the relevant information.
Another example could be the automatic creation of complex reports/documents, that are customized for customers or internal use.
These examples are all projects that companies could implement quickly in the short term.

From what you say the finance industry looks to be a sector which could benefit enormously from the design and deployment of SLMs? For example, many banks are bedevilled by legacy issues in their records on the one hand and stringent regulations on the other.

This creates an environment where they can easily fall foul of the relevant regulatory regimes inadvertently, even if all other aspects of their business were operating efficiently.

Banks therefore have to staff heavily to fulfil their regulatory obligations whilst at the same time constantly running the risk that system inadequacy can negate all their efforts. What can SLMs do to provide a solution?

SLMs are well-suited to the finance industry due to the domain-specific scale required to deal with millions of customers while maintaining data security. Most financial sector CIOs and CTOs want their AI solutions deployed next to their data and SLMs offer a cost effective way of achieving this (LLMs would cost millions a year).
Some of the use cases we are focused on reflect the fact that SLMs are excellent when it comes to verifying/checking things, and can do this across millions of text documents/conversations/reports in a way that humans cannot. In the context of a bank, if we think about compliance, a SLM could check all its material against regulations and guidelines, flag issues and suggest changes.
There are also other use cases that could be applied in banking relating to new regulations. One example is Consumer Duty. SLMs could equip banks with better monitoring, analysis and reporting capabilities about their customers by making better use of their customer interactions data.
The identification of vulnerable customers, better analysing the root cause of complaints, personalising communications or preventing fraud, are some of the areas we have seen SLMs can make a difference to ensure banks offer better customer service and stay compliant.
Overall, we see significant opportunities for applying SLM across multiple use cases within financial services given the fundamental importance of accuracy, transparency, scale, and data security.

SLMs seem ideally placed to improve both efficiency and effectiveness in corporate management. The advances in AI afforded by processing power and parallel processing should also open up significant advances in the natural sciences, including medicine and genetics.

It feels likely that the advances in decoding DNA combined with AI could give rise to new categories of drugs targeting specific diseases. Do you have insight on how this may happen?

Machine learning has shown great potential in modelling the physical properties of chemical compounds (AlphaFold). This is very useful because it can speed up the development of new drugs, as it cuts down significantly the first experimentation stages.
SLMs are also very good at analysing patterns, allowing technicians and doctors to find and cross reference information in a range of scientific publications. This is a great way to empower doctors to quickly find the right information and make a diagnosis for their patients, based on all the latest information available.
The SLMs’ ability to analyse patterns could also mean that at some points SLMs can suggest combinations of components a human has not thought of before. If AI does some of the ‘heavy lifting’, it can increase innovative thinking and creativity for its users.

The history of scientific advance is riddled with examples of both
serendipity and knowledge from an allied field overcoming incorrect conventional wisdom from more narrowly based practitioners. Does the processing power and the availability of data mean that both these barriers may be lowered, or could we just end up with a whole series of spurious relationships being pursued?

The constant growth of the computational power of modern systems allows the creation of more complex scientific models. Combining this with the large amounts of data that are now available will lower the barrier to create scientific models.
It is however unlikely that LLMs like OpenAI will be able to generate meaningful cross-field relationships as these usually require very deep knowledge of both fields

Because we started with the specifics on the arena in which Malted AI is engaged, we perhaps fell into the trap of treating AI as a generic term. We know that AI is not new. I remember using AI in the late 1980s through the good offices of a friend from PhD days. LLMs may be relatively new in the public perception but would it make sense to set out some of the sub categories?

That is a very good observation. AI in the 1980s would have referred to human intelligence captured in an artificial medium: Good Old Fashioned AI [1]. This means experts making rules and systematising their expertise into symbolic rules at a very low level. This type of AI was time-consuming to build and maintain. Further, it was only as good as what could be coded. Whilst this form of AI would work on numerical data, it would have limitations when working with text due to ambiguity.
Machine learning by contrast is where the data determines the behaviour of the system, rather than rules made by people. As a result, behaviour and pattern recognising abilities can “emerge” without human intervention. This stops humans acting as a bottleneck.
It is also why systems have advanced so quickly and why researchers still don’t know what an LLM can or cannot do. Machine learning as a form of AI only became feasible recently thanks to the drastic reduction in the cost of computing (Moore’s law) and the abundance of da19ta from the internet (which obviously was not there in the 80s).
The ability to store and process large datasets efficiently provided an ideal environment to train deep learning models, a type of machine learning inspired by neural networks in the human brain. Both Natural Language Processing (NLP) and computer vision are an evolution in AI.
They enable computers to understand, interpret and generate human language (NLP) and analyse images and videos as a human would do (computer vision).
AI evolution is making its use for enterprises a lot more feasible as its transitioning from a tool with human limitations, as AI was in the 80s, to a tool that can augment the work and powers of humans.

It feels like there is a bit of a feeding frenzy from private capital looking to invest in AI. What are your impressions?

AI clearly has great potential. Although there has been a lot of hype, there is also a huge opportunity to build billion or even trillion-dollar businesses. OpenAI for example, generates $3.4bn in revenues.
We’ve seen a large quantum of money coming into OpenAI, Cohere, Anthropic and new companies such as Reka and Mistral. These companies build the LLMs to sell via an API, with users generally paying “per token” (meaning, roughly, per word). However, these businesses incur a large amount of processing expense to build and maintain these services, and there is competitive pressure on them to keep prices low.
We’ve seen cloud compute (AWS, GCP, Azure) and hardware/GPU manufacturers (Nvidia) do extremely well. Some startups are raising $60m in order to spend $50m on cloud computing just to build a model before raising the next venture capital round. This is very different from traditional VC/startup economics.
Many “AI startups”, often referred to as “wrapper companies”, are building onto model companies. Unless these companies are developing clearly differentiated products that are hard to recreate, it is unsure how they will grow and maintain market share.
Despite the “buzz” about AI being able to replace incumbent technologies, as a disrupter you still need to build a comparable core product before you can put AI on top. You need to believe the AI for that use case is either game-changing from a return on investment perspective, and/or that the current product will not take 2-5 years to replicate (or the incumbent may have time to add AI themselves!)
In the last 6-12 months we have seen the beginning of a flight to quality and the perceived front runners / market dynamics are becoming clearer.
We believe the long-term winners will be those that are able to capture significant value from “real” use cases. Whether this is being a high-value part of a product or the whole product.
While is it true that AI solutions add value, there are significant underlying costs in running the systems and accessing high quality data. This is frequently overlooked in LLM wrapper companies.

Some of the leading lights in the industry have raised serious concerns about the ability of AI to take the world into very dangerous territory. This could be along the lines of the film ‘Wargames’, where a hacker almost precipitates a nuclear war. Alternatively, a number of commentators have cited how technological shifts have lowered the bar on the ability to create deadly pathogens. How can one constrain access and impose regulation or will there be a move to ever greater surveillance to provide some form of protection against such events?

Safely deploying AI it is as much about the business processes as the technology itself. Recognition of potential issues is prompting discussion of guardrails as an area of focus for the industry.
Regulation and property rights is also a salient topic – for example, whether models like OpenAI can use internet data “for free” is a live question.
Equally, does AI need to be regulated like legal firms or banks? We believe open-source development has a big part to play in the sense that it allows people to explore and understand the data and provenance of models.
There is also risk around regulation risk. Will it mean that AI first-movers will be hard to dislodge and so will develop into monopolies or oligopolies (some cynics even say that the likes of OpenAI and Anthropic would like to be regulated for that reason!)
It is true that both LLMs and SLMs enable easier access to specialised topics. However, we are already seeing a lot of effort in trying to minimise access to potentially dangerous areas.
We also need to remember that in terms of specific knowledge, these models are learning from freely available sources. That simply allows much easier and faster access to this information.
The main concerns that AI will have is its less deterministic nature. It needs to be emphasised that these are tools and we should not over-rely on them. It’s less the risk of a “I, Robot” scenario and more the risk of misuse and unthinking acceptance of the AI-generated results.
In the end what AI is doing is approaching problems that are less well defined and where a 100%-reliable solution is not realistic. This must not be forgotten.

Summary thoughts

As always, it is incredibly helpful to be able to discuss a topic with leading experts who are operating at the leading edge of technology. Understanding what a technology cannot do is just as important as understanding what it potentially can do. In addition, the detail on how the technology can be applied is also critical. A number of key points emerge from these discussions.

In many applications the technology is not agnostic. It requires specific domain knowledge if it is to be effectively applied. Partnership with users is critical. This is likely to be a slow process since the evidence is that the vast bulk of users do not understand LLMs and SLMs sufficiently to allow their introduction.

The knowledge gap lies on both sides. The technologists have to develop enough domain expertise so that the dialogue necessary to create commercial opportunities can take place. The size of the gap is very specific to the potential user. Some users are sophisticated and the gap is narrow, whilst others are some distance away from formulating use cases. One would expect as the early deployments roll out this issue will gradually evaporate.

Paradoxically, some of the largest positive impacts may be eventually recorded in less data-sophisticated companies. There are many large concerns where operations are bedevilled by system legacy issues. These are most common in companies where their history has seen them absorb many other businesses. Banks are a prime example of this with prior amalgamations and redundant systems frequently making data access difficult. The ability to access, collate and process unstructured data in such circumstances will be invaluable.

LLMs are not naturally deterministic. In other words, asking the same question can result in different answers. This is less of an issue for distilled models, but users need to be aware of the issue. Equally, there have been examples of what is termed ‘hallucination’, where a model has effectively ‘made up’ what appeared an entirely plausible answer; or provided a non-existent reference. This reinforces the importance of both domain expertise in the process of creating tools and human involvement whilst operating them.

To the extent that the same data are recycled by LLMs and the output stored as further data there must be a potential issue of what might be termed ‘incest’ , in the sense that the dataset is not broad enough and is self-reinforcing its own errors.

Meanwhile the battle over property rights is only just beginning. The internet was blessed by the lack of control afforded to the infrastructure providers. The inability to differentially charge users (net neutrality) provided the opportunity for the ‘platform’ companies to develop massive businesses leaving the infrastructure providers as ‘dumb pipes’ and pricing models which effectively mirrored utilities. If LLMs can scrape data from any source and repackage it without payment, the owners of the original intellectual property will find themselves being effectively appropriated.

There is no doubt that the impact of the latest iterations of AI are going to have a lasting impact on business and employment patterns. Although it will take some considerable time it should be expected that may white collar occupations will be on a declining trend. Others will increase and be higher value added but be unlikely to compensate for the losses.

This will carry with it clear implications for the structure of further education. In the sciences one should expect a leap forward in areas such as drug discovery given the additional analytical input that machine learning can provide. Similarly, diagnostics should see a significant uplift, particularly where large datasets exist such as in the UK NHS.

As always, the gold rush of investor monies will met with many failures as well as some spectacular successes. Inevitably there will be infrastructure overbuild in the early stages as new entrants compete to develop offerings. The cost of engaging in the LLM competition is significant, not simply in the direct cost of compute power but also in the associated energy costs. However, this is just part of the repeating cycle and whilst the associated mortality rates will cause pain for many early investors the technology roll out will continue apace bringing with it a profound impact on the global economy.

Malted AI

Detailed below is some more information on the company. For the purposes of full disclosure, Dr Sandy Nairn is an investor and Non Executive Director of Malted AI.

Background Malted AI capabilities

High level: We enable enterprises to build smaller, more focused AI models with greater performance at a fraction of the cost.

Technology: Large Language Models (LLMs) lack domain specificity and are not cost-effective at scale. Conversely, Malted AI builds custom Small Language Models (SLMs) that become experts at your problem and are 10-100x smaller. Our distillation technology creates high-quality automatic data that would have required thousands of human hours to manually annotate. We use this rich dataset to train task-specific SLMs that are optimised for your specific use case.

Vision: Our vision is to create significant value within enterprise environments by addressing complex problems that general AI cannot solve. We believe smaller, specialised models (SLMs) coupled with distillation are required to deploy AI at scale.

We are at the start of the AI revolution and are just scratching the surface of what’s possible. Many other companies are focused on large and general AI models; however, we believe there are better ways to unlock AI’s true potential.

Product: Malted AI enables enterprises to build custom AI applications utilising distilled Small Language Models (SLMs), trained on their proprietary data in a secure environment. The company leverages knowledge distillation to inject domain-specific knowledge into smaller, focused SLMs that can scale in production. We combine companies’ internal knowledge with our ‘teacher’ model to generate scalable AI-augmented datasets (“synthetic data”). These datasets then train a series of ‘student’ models to solve complex real-world problems where general AI solutions fail (e.g., ChatGPT, GPT4), resulting in cost reduction by a factor of 10-100x. Malted AI’s platform additionally manages the challenging support, maintenance, and automatic scaling requirements of production SLM deployment.

Malted AI has strong commercial focus and market demand, working with several enterprise customers to solve domain-specific problems. Our current business model comprises an onboarding onto the Malted AI Platform and a monthly licence for the software, support, maintenance, and scaling. Example projects:

1. Consumer duty: Creating a customer classification system for financial services companies ensuring they identify and support vulnerable customers.

2. Legal Clause Generation: Building a solution to automatically edit complex legal clauses using multiple specialised SLMs.

3. Patent Search & Understanding: Creating a domain-specific search and generation system to answer complex technical questions.

About the author

Dr Sandy Nairn CFA, FRSE has 40 years of experience in fund management and investment research. He is Executive Director of the Global Opportunities Trust plc (www.globalopportunitiestrust.com) and the author of three critically acclaimed books about the stock market, most recently The End of The Everything Bubble (Harriman House, 2021). He is currently working on AI as a follow-up and extension to the Engines That Move Markets (Harriman House, 2018) book which examined technological developments in the modern age.

Research Articles

In the hotseat – a Quoted Data interview with Sandy Nairn and Alan Bartlett

Global Opportunities Trust plc: 2025 AGM – a video update

10 Minutes with… GOT