BishopPhillips
BishopPhillips
BPC Home BPC AI Topic Home BPC RiskManager BPC SurveyManager BPC RiskWiki Learn HTML 5 and CSS Enquiry

LARGE LANGUAGE MODELS - The State Of The Art

An Introduction.

AI Revolution

As AI technology continues to evolve, language generation models have emerged as a promising area of research. These models are designed to generate human-like text based on a given prompt. They have the potential to revolutionize the way we interact with machines and automate many tasks that previously required human intervention.

One of the most significant advancements in this field is the development of large language models (LLMs). These models are trained on vast amounts of text data and can generate coherent and fluent text that is often indistinguishable from human writing. LLMs can be used for a wide range of applications, including chatbots, search engines, summarization tools, and even code generation.

However, LLMs are not without their challenges. They can sometimes produce biased or problematic outputs due to the data they are trained on or the way they are prompted. Additionally, there are concerns about the environmental impact of training these models, which require massive amounts of computational resources.

Despite these challenges, many new solutions have appeared on the internet that use prompts to create an output. Some such examples are ChatGPT, a chatbot developed by OpenAI that uses the GPT4 (and now GPT4.5), CoPilot (also OAI GPT based), LEO from Brave which uses a user selected LLM, Gemini (using Google's Gemini) and X's Grok (using xAI's Grok), and many others, LLM's to generate responses to user queries. Another example is AI Dungeon, an interactive fiction game that uses LLMs to generate storylines based on user inputs.

One point of confusion with LLMs is the tendency of the public to confuse the LLM with the application (or agent) wrapping it. You will often hear ChatGPT and GPT4 used interchangeably in discussions, but these are actually distinct systems - like a finance system and the database engine on which it runs. ChatGPT is a chatbot that uses GPT3+ as its LLM or "database". As such ChatGPT is intended to be publicly facing and has filters and constraints built in (called Guardrails) and uses (at the time of writing) about 20 billion parameters versus GPT's 175+ billion parameters. BingAI Chat (now CoPilot) is similarly a chat bot agent that wraps GPT with a conversational interface, and ties in with many other MS Windows tools to provide a range of services from automating windows tasks, to analysing spreadsheets, to supporting code completion in programming editors.  As search engine enhancers the LLM agents use an algorithm called RAG to take a search engine result which is vectorised using the same tokeniser used by the LLM they wrap and then provided to the LLM as part of an augmented prompt which can the be used to augment the LLM's knowledge base to formulate a response.  In this scenario the LLM is used primarily as a sentence generator with the RAG outcome forming the  

LLMs are semantic probability engines that generate text based on the likelihood of that text appearing next to previously generated sequence of text and semantic concepts & context and/or in response to a provided sequence of text in the form of a prompt.  They are essentially probablistic and non-deterministic.  LLM's do not have a notion of "understanding" or knowledge per-se treating a body of data as a recommendation engine for how elements of that data contained within relate to other elements of the data contained within that body of knowledge.  They are generating text that satisfies statistical consistency with the prompt provided.

Surprisingly, this characteristic makes LLM's better at creating (where factual linking is not required) than interpreting where accuracy and correctness is paramount.  Unlike earlier solutions, like expert systems, the probabilistic nature of LLM's means they are weaker when it comes to the goals of correctness and accuracy.  When an LLM deviates from reality by producing a convincing answer to a question that is not grounded in known fact, we call that "hallucination".  The better the training set and more relevent to the problem at hand, the less likely such hallucinations are to occur.  "Better" however, is a subjective concept and requires more consideration.  As in the concept of Total Quality Management, where quality was not defined in absolute terms, but rather "fit for purpose" terms, the quality of the training set is determined by how fit for the targeted purpose it is. Typically deficiencies in the training set fall into one or more of the following categories:

  • Overfitting: Overfitting occurs when the data is too narrowly defined and the model is overtrained on that narrow data set. It is the typical "you don't know what you don't know" problem of "if all you have is a hammer than every screw is a nail".  Overfitting is the result of the neural network learning a non-diverse data set and applying it to every problem encountered.  In the early days of character recognition networks it arose when a network was trained on perfectly formed and positioned letters making it incapable of recognising slightly imperfect letters or oddly positioned letters.  The solution was to introduce errors into the training set - badly formed letters and a wider range of letters.  Of course it is a balancing act as too much noise in the dataset prevents the network from reliably distinguishing one letter from another. An overfitted network prevents the model from generalising well to new data.
  • Data Errors:  Data that is too noisy, or mislabeled, or miss classified will obviously lead to the model simply learning the wrong thing. If I teach the model that a cat is a fish, I cannot expect it to recognise that a cat is not a fish when I show it a fish.  Aside from the obvious problem of mislabeling, bias in the data set leads to a similar problem, in that if the model learns one view of the world much more strongly than another view of the world it will tend to interpret what it sees on a probabilistic basis as belonging to the view of the world that has dominated in its training set.  Further as the model has no "instinct" for right or wrong: no sense of credible versus incredible; a set of lies learned as truth is a truth as much as a set of truths learned as truth is a truth.  In a sense this is no different from a human in that we grow and mature within a culture that holds certain values as important and as adults we continue to interpret the world in terms of those cultural values.  It does not make us right, and they could be complete hallucinations of reality, but they are truths to us and those with whom we share a common culture.
  • Data sparsity: The wider the data domain, the more likely any subdomain will be sparsely covered.  Where data is missing or not sufficiently detailed, the model is likely to invent what is required using other data to bridge the gap.  This is actually a desired behaviour in many cases, but it can lead to hallucination when the bridging performed is in fact in error.  Clearly the solution is refreshing the training dataset and filling in holes in the knowledge, but this may be more easily said than done as identifying that holes exist can of itself be a problem.

While the systems on which LLMs are trained are usually vast and the models large during training, once trained and finetuned the models can be surprising small and efficient to use.  They are in many cases small enough to run on a desktop PC or even (with some of the newer mini models) a mobile phone!

These systems demonstrate the potential of LLMs for creating engaging and interactive experiences for users. However, they also highlight the need for responsible development and deployment of these technologies. As AI technology continues to advance, it is essential that we consider the ethical implications of these systems and ensure that they are developed in a way that benefits society as a whole.

Language generation models represent an exciting area of research in AI technology. With the development of large language models and new solutions appearing on the internet that use prompts to create an output, we are seeing new possibilities for automating tasks and creating engaging experiences for users. We must, however, also be mindful of the challenges associated with these technologies and work towards responsible development and deployment.


...Next: Overview of current LLM Solutions....