How do LLMs "store" info?
Prompt
Let's say that I download a local LLM and ask it something like:
"What color is the sky?"
It will tell me something like:
"The sky is blue"
Where is this information actually "coming" from?
Generation Details
Date | LLM | Access Method |
---|---|---|
Nov 2024 | Web UI |
How Knowledge is Stored in an LLM
- Training Data and Parameters:\ The LLM is trained on vast amounts of text data, which includes documents, articles, books, websites, etc. During training, the model doesn't store this data verbatim. Instead, it processes the text to learn patterns, relationships between words, grammar rules, factual information, and common sense knowledge. These patterns are encoded into the model's parameters (weights), which are adjusted over time to improve its ability to predict the next word in a sequence based on preceding words[6].
- Pattern Recognition:\ When you ask about the color of the sky, the LLM uses these learned patterns to infer that "blue" is the most likely answer based on its training data. The model has encountered many instances where "sky" is associated with "blue" in various contexts during training[5][6]. This knowledge is not stored as a direct fact like in a database but as a probability distribution over words that follow certain contexts. The word "blue" has a high probability of following words like "sky" because of these learned associations.
- Factual Knowledge Acquisition:\ The process of acquiring factual knowledge during training involves progressively increasing the likelihood of certain facts being correct based on repeated exposure in the training data[7]. For example, if many sources in the training data state that the sky is blue, the model will adjust its parameters to reflect this association more strongly. However, this knowledge is generalized across many examples rather than being tied to any specific document or sentence[1][7].
- Inference:\ When you prompt the model with a question offline, it uses these stored patterns to generate an answer by predicting what word (or sequence of words) is most likely to follow based on prior context. In this case, when asked about the sky's color, it predicts "blue" because that matches what it has learned from its training data[6].
Where Does This Knowledge Come From?
The knowledge comes from:
- Training Data: The vast corpus of text used during pre-training contains examples where "sky" and "blue" are frequently associated.
- Model Parameters: The patterns learned from this data are encoded into billions (or trillions) of parameters in the neural network. These parameters don't store individual facts but rather represent complex relationships between words and concepts that allow the model to infer answers during inference[5][6].
In summary, when you download an LLM for offline use, all foundational knowledge (like "the sky is blue") is embedded within the model's parameters as statistical patterns learned during training. The model generates responses by recognizing and applying these patterns rather than recalling specific pieces of raw data from its training set.
Citations:\ [1] Knowledge Storage and Extraction in Language Models (Part A) https://openreview.net/forum?id=rVnxymbsOS\ [2] How much knowledge is "hidden" within an LLM? : r/LocalLLaMA https://www.reddit.com/r/LocalLLaMA/comments/197dbs4/how_much_knowledge_is_hidden_within_an_llm/\ [3] [D] All state of the art LLMs make factual mistakes at the amateur ... https://www.reddit.com/r/MachineLearning/comments/1bdi4nr/d_all_state_of_the_art_llms_make_factual_mistakes/\ [4] The Beginner's Guide to LLM Prompting | Haystack - Deepset https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting\ [5] Memorizing Documents with Guidance in Large Language Models https://arxiv.org/html/2406.15996v1\ [6] How Does a Large Language Model Actually Work? | Entry Point AI https://www.entrypointai.com/blog/how-large-language-models-work/\ [7] How Do Large Language Models Acquire Factual Knowledge ... https://arxiv.org/html/2406.11813v1
The above text was generated by a large language model (LLM) and its accuracy has not been validated. This page is part of 'LLMs-on-LLMs,' a Github repository by Daniel Rosehill which explores how curious humans can use LLMs to better their understanding of LLMs and AI. However, the information should not be regarded as authoritative and given the fast pace of evolution in LLM technology will eventually become deprecated. This footer was added at 16-Nov-2024.