Both retrieval-augmented generation (RAG) and fine-tuning can make a language model more useful for a specific domain. They solve different problems, and choosing the wrong one is an expensive mistake.
This is a decision framework, not a theoretical overview. The goal is to give you the questions to ask when you're scoping an AI project and need to decide which approach to invest in.
RAG connects a language model to an external knowledge base at inference time. When a user sends a query, the system retrieves the most relevant documents from the knowledge base and includes them in the prompt. The model generates an answer grounded in those documents.
The key property of RAG is that the knowledge is external and updateable. You can change what's in the knowledge base without changing the model. New documents, updated policies, recent events — all of these can be reflected in RAG responses without any retraining.
Fine-tuning adjusts the weights of a pre-trained model using examples of the behavior you want. The model learns from the examples and encodes that knowledge into its parameters. After fine-tuning, the model behaves differently — it follows a particular format, uses domain-specific terminology, or produces outputs calibrated to your use case.
The key property of fine-tuning is that it changes behavior, not knowledge. It's best for teaching the model how to respond, not for teaching it what to know.
Start with this question: is your problem primarily about access to current or proprietary information, or is it primarily about output style, format, and domain-specific behavior?
If your problem is informational — the model doesn't know your product documentation, your internal policies, your client contracts — RAG is almost always the right answer. It's faster to implement, cheaper to maintain, and the knowledge is auditable (you can see exactly what documents were retrieved for any given response).
If your problem is behavioral — the model writes in a generic style when you need a specific voice, produces inconsistent formats, makes domain errors in how it reasons — fine-tuning addresses those problems more directly.
RAG fails when the retrieval step fails. If your documents aren't indexed well, if the query-document similarity scoring doesn't surface the right content, or if the relevant information is spread across too many documents for effective retrieval, the model won't have what it needs to generate a good answer.
RAG also fails when latency is a constraint. Each RAG query involves at least one retrieval step (often two — one for the query, one to re-rank results) before the generation step. If you need sub-100ms responses, RAG's architecture may not be compatible with your requirements.
Fine-tuning fails when the problem is informational. If a customer asks about a product specification and the model needs to give an accurate answer, fine-tuning can't reliably inject that information — it will hallucinate with high confidence in the fine-tuned style. This is more dangerous than a vanilla model that says it doesn't know.
Fine-tuning also fails when your data distribution shifts. A fine-tuned model trained on last year's examples may degrade significantly when the inputs change. You need fresh training data to retrain, which is expensive and time-consuming.
The most capable production systems often use both. Fine-tune for style and domain behavior, use RAG for current knowledge. A customer support system might be fine-tuned on examples of good support responses (teaching tone, format, and escalation judgment) and also connected to a RAG knowledge base of current product documentation and known issues.
This architecture is more complex to build and maintain, but it separates the concerns cleanly: behavior is trained, knowledge is retrieved.
If you're starting a new AI project and aren't sure which to use: start with RAG. It's faster, cheaper, and produces auditable outputs. Build your eval set, measure performance against your success criteria, and only consider fine-tuning if RAG isn't getting you to the bar you need. Most production use cases that teams initially think require fine-tuning turn out to be solvable with good RAG implementation and prompt engineering.