July 8, 2024
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.
The article discusses the significant role of language in human and machine interactions, focusing on the need for generalized models to handle complex language tasks such as translation, summarization, information retrieval, and conversational interactions. Recent breakthroughs in language models are mainly attributed to the development of transformers, enhanced computational capabilities, and the availability of large-scale training data. These advancements have led to the creation of Large Language Models (LLMs) that can perform close to human-level on various tasks.
LLMs are at the forefront of artificial intelligence systems, capable of processing and generating coherent text and adapting to multiple tasks. The history of natural language processing (NLP) has evolved from statistical models to neural language modeling and then to pre-trained language models (PLMs), eventually leading to LLMs. Traditional language modeling was task-specific and supervised, while PLMs are trained in a self-supervised setting on large text corpora. This training approach aims to learn generic representations applicable to various NLP tasks. Fine-tuning PLMs for specific downstream tasks has shown to surpass traditional language modeling performance. The transition from PLMs to LLMs involved a significant increase in model parameters and training datasets.
There has been a growing trend in the release of LLMs, with notable examples including T5 and mT5, which used transfer learning. GPT-3 demonstrated that LLMs could be zero-shot transferable to downstream tasks without fine-tuning. Although pre-trained LLMs sometimes fail to follow user intent in zero-shot settings, fine-tuning with task instructions data and aligning with human preferences enhances their performance and reduces misaligned behavior.
LLMs exhibit emergent abilities such as reasoning, planning, decision-making, in-context learning, and answering in zero-shot settings, acquired due to their large scale. These abilities have broadened their adoption in various fields like robotics, tool manipulation, question answering, and autonomous agents. Improvements in these areas have been achieved through task-specific training or better prompting.
Despite their capabilities, LLMs face challenges such as slow training and inference times, extensive hardware requirements, and high running costs. These challenges limit their widespread adoption and have led to research in developing better architectures and training strategies. Methods like parameter-efficient tuning, pruning, quantization, knowledge distillation, and context length interpolation have been studied for efficient LLM utilization.
The success of LLMs across a variety of tasks has led to a surge in LLM-related research. Researchers have organized this literature into surveys and topic-specific surveys. The article aims to provide a comprehensive yet concise overview of the general direction of LLM research, covering architectural and training details of pre-trained LLMs and discussing concepts like fine-tuning, multi-modal LLMs, robotics, augmented LLMs, datasets, evaluation, and more.
A broader overview of LLMs, dividing LLMs into five branches:
1. Training
2. Inference
3. Evaluation
4. Applications
5. Challenges
A basic flow diagram depicting various stages of LLMs from pre-training to prompting/utilization. Prompting LLMs to generate responses is possible at different training stages like pre-training, instruction-tuning, or alignment tuning.
The article further reviews various well-known pre-trained LLMs, discussing their architectures, training objectives, datasets, and fine-tuning details. Examples include T5, GPT-3, and mT5, each with their unique features and applications in natural language understanding and generation.
A flow diagram of Retrieval Augmented LLMs. The retriever extracts a similar context to the input and forwards it to the LLM either in simple language or encoded through Fusion-in-Decoder (FiD). Depending on the task, retrieval and generation may repeat multiple times.
Retrieval Augmentation: This process involves augmenting LLMs with additional information to prevent errors due to incorrect data. There are two types:
Tool Augmented LLMs: These are models that utilize external tools to enhance their performance, which is particularly useful for tasks that require planning and execution beyond language processing. The document includes a flow diagram (Fig. 13) illustrating how an LLM can utilize various tools to generate an output. This may include accessing locally stored memory, executing tasks, interacting with external APIs, and updating information based on feedback. Some notable examples:
These augmentation strategies aim to improve the LLMs' ability to handle complex tasks by combining their natural language processing capabilities with information retrieval and tool interaction. This makes LLMs more versatile and better aligned with user intent and complex task requirements.