BishopPhillips Consulting - The AI Revolution

LARGE LANGUAGE MODELS - An Overview of Current Applications

A Survey Of The Current Systems.

Large Language Models, Agents & Tools

Large language models (LLMs) are designed to generate human-like text based on a given prompt. They are revolutionizing the way we interact with machines and automating many tasks that previously required human intervention. The program that interfaces the LLM to the human or machine is called an agent.

An agent uses an LLM as a brain to make decisions about how to use a tool. In the case of a chat bot, the tool is the text input and display devices (and potentially audio microphone and speakers) that the bot (the agent) uses to interact with a user while the "brain" to which it interfaces to get the content is the LLM. In the case of a coding support agent the tool is the IDE (integrated development environment) or editor the human user is using to edit software and with which the agent (the plugin such as copilot) is assisting. In the case of a robot the voice encoder/decoder and actuators, vison system and touch sensors that comprise the robot are the tools. In some cases and application can be both a tool and an agent depending on the use case.

Over the last five years, LLMs (and commensurately agents) have evolved significantly. One of the most significant advancements in this field is the development of large-scale pre-trained language models. These models are trained on vast amounts of text data and can generate coherent and fluent text that is often indistinguishable from human writing. LLMs can be used for a wide range of applications, including chatbots, search engines, summarization tools, and even code generation.

One of the most popular proprietary LLMs (at the time of (re)writing) is Chat GPT-4 developed by OpenAI. GPT3 had 175 billion parameters while GPT4 has approximately 1.75 trillion parameters and can generate high-quality text for a wide range of applications. Parameters are the numeric weights and biases that comprise the variables at the neuronal level that the LLM adjusts in order to learn. Like Gemini (Google), Claude (Anthropic), LLaMA (Meta), Mistral (Mistral) and Grok (xAI), GPT is a general purpose LLM trained on a large corpus of narrative text and the fine tuned with conversation patterns. LaMDA also developed by Google, (Language model for dialogue applications) is an LLM trained on general dialogue from the outset and designed to understand natural language conversations and provide relevant responses. Which approach is better for conversational AI is an open question today as LaMDA released in 2021 and is thus "old" by LLM standards and the other models mentioned have been released after that date and substantially enhanced. Their larger training sets and fine tuning may have given them conversational capabilities indistinguishable from those of the earlier LaMDA.

So far we have concentrated on the large (generally proprietary) LLMs, but many now have mini versions released which are small enough to use on a mobile device, and some like LLaMA (Meta) and Grok (xAI) are released as open source models (usually through huggingface.co. In the case of LLaMA the models are released as fine tuned & optimised end products ready for deployment onto desktop systems with advance graphic cards to aid in inference processing while in the case of Grok the raw model has been released which requires fine tuning training to be effectively used. There are many other models available on huggingface enabling developers and experimenters to download the model and directly apply it to their own use case on local hardware. This is a legitimate use of the models for both commercial and non commercial purposes although some skill in coding and AI technologies in Python or C++ will be required. While the large models are available, there are also many mini-versions also which in some use cases actually outperform their larger cousins in specific problem domains and in many cases are otherwise sufficient for general use. See Hugging Face – The AI community building the future.. for access to a very large library of LLMs and associated agents as well as detailed instructions on how to use them. Both LM Studio (LM Studio - Discover, download, and run local LLMs) and Leo in the Brave Browser (Brave Browser Download | Brave) support connecting and loading a variety of open source LLMs. The first advantage of a local LLM is that all the data shared with it in the prompt stays local while the second advantage is that it is a lot cheaper than paying hosted usage fees.

Use Cases

The LLM is essentially just a model of how semantic elements (called tokens or words) relate to each other, but equipped with agents (tools that wrap the LLM feeding it data in the form of prompts and receiving, responding to, and acting on responses provided) are what give the LLM its application and usefulness. Chat bots are a kind of agent with which most users are now familiar. In some cases the agents are designed to deal with multiple LLM's allowing the user to target the response to different purposes.

Some LLMs are naturally better at some tasks than others because of their training sets, architecture or fine tuning. For example Claude is said to be particularly strong in program code generation, while LaMDA is stronger in conversational dialogues and OpenAI o1 is specifically trained to perform complex reasoning tasks by generating a long internal chain of thought before responding and thus also excels in mathematics, science and coding problem solving. Among the classic uses cases are:

Code generation. LLMs can be used for code generation. For instance, StarCoder developed by Hugging Face was one of the earlier state-of-the-art LLM code generators that can generate code snippets based on natural language prompts, and has now been joined by several other powerful code generators in the form of Code LlaMA (Meta), which is an open-source LLM that can generate code for various programming languages. GPT-4 and OpenAI's o1 , Claude Sonnet, Grok3 and Mixtral are capable of developing code in response to a descriptive prompt. In some cases you can show the LLM agent a picture of a web page or screen you wish to build and the LLM will generate the code to produce it. Code generators are still very much in their early stages and while they excel in small code snippets, proposing code improvements and answering lookup style questions of the form "how do I perform a quarternion rotation in C++" they still have a tendency to underperform on complex or long code stretches and complete applications. The best use case at the moment is still as a productivity enhancer for a strong coder (who is capable of reading the code produced and adapting and correcting it).
Content creation (images & text). A number of LLMs are dedicated to content creation. For example, DALL-E developed by OpenAI can generate images from textual descriptions. Another example is AI Dungeon, which uses LLMs to generate storylines based on user inputs. One of the very recent trends has been a merging of general language LLMs and image generation LLMs where the text LLM can seamlessly interface to a content generating LLM to create or analyze images as part of a general text response (Bing AI / Chat (based on ChatGPT) which uses the DALL-e Diffusion engine to generate images being one of many current examples )
Translation. LLMs are used for translation. For instance, ChatGPT developed by OpenAI can translate written texts into another language as can Google Translate, which uses LLMs to translate text into multiple languages.
Optical Character Recognition. LLM enhanced OCR is particularly successful as a use case because many of the mistakes made by the original OCR decoder can be corrected by the LLM which understands how the text should have decoded from the scanned image. Among others CoPilot (using GPT) is particularly powerful in this regard (in this author's experience), but this is likely now a fairly general capability, at least, among the larger LLM's.
Document summary and interpretation. LLMs can be used for summarization of submitted text. For instance, BART developed by Facebook can summarize long texts into shorter ones. Another example is T5 developed by Google, which can summarize texts in multiple languages. These were earlier examples and since this section was originally written the capability of summarising documents has become a feature of virtually every LLM including the smaller open source models. Many have now progressed onto the ability to analyse images and summarize or interpret those as well.
Conversation and chat. Most general text LLMs these days can be used for question answering and general conversation. For example, GPT-3 & 4 developed by OpenAI can answer questions in multiple languages based on a given prompt. An earlier example was Turing-NLG, which is an LLM that can answer questions in multiple languages, but this skill is now enjoyed by many of the current releases of larger models, although most are restricted to only a few languages. Of course, a staple use of text LLMs is chatbots, and while some are dedicated to the task, other more generalized systems have been "wrapped" by applications to drive a bot. For example, DialoGPT developed by Microsoft can generate responses to user queries in a conversational manner. Another example is Meena, which is an LLM that has been trained on a massive amount of data and can generate human-like responses to user queries.
Sentiment analysis. LLMs are also being used for sentiment analysis. For instance, GPT-3 (4) can analyze the sentiment of a given text, as can BERT developed by Google, which can classify the sentiment of a given text as positive or negative. Sentiment analysis is another capability that has spread across most if not all LLM's
Captioning Images. Some LLMs are able to caption images. For example, CLIP developed by OpenAI can generate captions for images based on a given prompt while DALL-E, can generate images from textual descriptions.
Video Generation and understanding. Short video generation from text descriptions and video interpretation is the domain of Autoregressive LLMs (like Loong) and Vid-LLMs. Some of these systems include Sora (from OpenAI) Runway, Pika, Magic Hour, Hunyan, Luma Labs, Domo and Genmo. This space is very much bleeding edge and videos are generally limited to 10 to 30 seconds and character consistency from one video to the next is still somewhat of a challenge.
Text Classification. LLMs have also been used for text classification. For instance, BERT can classify texts into different categories such as news, sports, and entertainment. Another example is RoBERTa, which is an LLM that can classify texts into different categories such as sentiment analysis and question answering.

If you want to learn more about LLMs and their applications, we recommend checking out this article on Beebom that provides a list of many of the best large language models in 2023 (as of the time of original writing) with detailed descriptions and use cases. At the time of the update to this page (2025) this article is probably a little dated.

LLMs and their agents represent an exciting area of research in AI technology. With the development of large-scale pre-trained language models and new solutions appearing on the internet that use prompts to create an output, we are seeing new possibilities for automating tasks and creating engaging experiences for users. The applications of LLMs are diverse and include chatbots, search engines, summarization tools, code generation, content creation, translation, and more. However, we must also be mindful of the challenges associated with these technologies.

...Next: Image Content & Processing LLMs....

Overview of LLM Solutions

References

AIMultiple
Computerworld
Wikipedia
Huggingface
Meta
Techopedia
Ml6
Beebom
Github