Skip to content

Prompt

Please provide a document called "The Forgotten LLMs". Your objective here is to list LLMs that haven't captured fame as much as some of the others, but which are still interesting

Output

The Forgotten LLMs

While many Large Language Models (LLMs) like GPT-3 and BERT have gained widespread recognition, several other models have made significant contributions to the field but haven't captured as much fame. Here is an overview of some of these lesser-known yet interesting LLMs.

Alpaca 7B

Developer: Stanford CRFM\ Release Date: March 27, 2024\ Description: Alpaca is a 7 billion-parameter model fine-tuned from Meta's LLaMA 7B model. Despite its smaller size, it performs comparably to larger models like text-DaVinci-003 (ChatGPT 3.5). Alpaca is available for research purposes, emphasizing its role in academic exploration rather than commercial use[3].

DBRX

Developer: Databricks' Mosaic ML\ Release Date: March 27, 2024\ Description: DBRX is an open-source LLM with a mixture-of-experts architecture, featuring 132 billion parameters. It utilizes a unique system where only a subset of parameters is active during inference, offering efficient performance with a wide range of expert combinations[3].

Falcon

Developer: Technology Innovation Institute (TII)\ Release Date: 2023\ Description: Falcon is designed for diverse applications such as chatbots, language translation, and content creation. It offers models with 7 billion and 40 billion parameters and supports fine-tuning for specific tasks. Its open-source nature makes it accessible for various research and development projects[2].

Guanaco 65B

Developer: Tim Dettmers\ Release Date: 2023\ Description: Built on the LLaMA base models, Guanaco-65B is a fine-tuned chatbot model intended for research purposes. It explores the capabilities of smaller language models through innovative tuning techniques and is available under an open-source license[2].

Mistral

Developer: Not specified in provided data\ Release Date: Not specified in provided data\ Description: Mistral is a compact model with 7 billion parameters that outperforms similarly sized models like LLaMA. Its smaller size enables efficient self-hosting while maintaining competent performance for business applications[1].

Orca

Developer: Microsoft Research\ Release Date: Not specified in provided data\ Description: Orca aims to demonstrate strong reasoning abilities by mimicking the reasoning traces of larger models. Despite having only 13 billion parameters, it competes with much larger models like GPT-4 in performance, showcasing the potential of improved training methods for smaller models[2].

Pythia

Developer: EleutherAI\ Release Date: February 13, 2023\ Description: Pythia is a series of models ranging from 70 million to 12 billion parameters. Developed by EleutherAI, these open-source models serve as foundational bases for fine-tuning into instruction-following LLMs like Dolly 2.0 by Databricks, emphasizing flexibility and accessibility[3].

StableLM

Developer: Stability AI\ Release Date: In progress as of writing\ Description: StableLM series includes models with parameters ranging from 3 billion to potentially 175 billion. Developed by the creators of Stable Diffusion, these models aim to be transparent and accessible while supporting a wide range of applications from text generation to more complex tasks[1].

Vicuna 33B

Developer: LMSYS\ Release Date: Not specified in provided data\ Description: Vicuna is derived from Meta's LLaMA and fine-tuned using data from ShareGPT.com. With its 33 billion parameters, it serves as an effective model for research on large language models and chatbots despite being less capable than some larger counterparts[1][2].

These "forgotten" LLMs highlight the diversity and innovation within the AI community. They demonstrate that even less-publicized models can offer unique features and capabilities that contribute significantly to advancements in natural language processing and AI research.

The above text was generated by a large language model (LLM) and its accuracy has not been validated. This page is part of 'LLMs-on-LLMs,' a Github repository by Daniel Rosehill which explores how curious humans can use LLMs to better their understanding of LLMs and AI. However, the information should not be regarded as authoritative and given the fast pace of evolution in LLM technology will eventually become deprecated. This footer was added at 16-Nov-2024.