OpenAI ChatGPT (GPT-3.5) API: How do I make a fine-tuned GPT-3.5 model only answer from the fine-tuned data?

Question

OpenAI now allows us to fine-tune GPT-3.5 models. I have tested and fine-tuned the model with my own dataset but the problem is the fine-tuned model generates the answer randomly, not correct based on my custom dataset.

Is there any way to make the model only answer from my own fine-tuned dataset?

Here's something that might assist you: consider exploring this implementation using LangChain - you can find it at [PrivateDocBot](https://github.com/Abhi5h3k/PrivateDocBot) — Abhi, Aug 27 '23 at 15:20

score 1 · Accepted Answer · answered Aug 25 '23 at 10:38

This is a completely wrong approach (as you've already figured out).

As stated in the official OpenAI documentation:

Some common use cases where fine-tuning can improve results:

Setting the style, tone, format, or other qualitative aspects

Improving reliability at producing a desired output

Correcting failures to follow complex prompts Handling many edge cases in specific ways

Performing a new skill or task that’s hard to articulate in a prompt

Fine-tuning is not about answering a specific question with a specific answer from the fine-tuning dataset.

What you need to implement is a semantic search based on embeddings, as stated in the official OpenAI documentation:

When should I use fine-tuning vs embeddings with retrieval?

Embeddings with retrieval is best suited for cases when you need to have a large database of documents with relevant context and information.

By default OpenAI’s models are trained to be helpful generalist assistants. Fine-tuning can be used to make a model which is narrowly focused, and exhibits specific ingrained behavior patterns. Retrieval strategies can be used to make new information available to a model by providing it with relevant context before generating its response. Retrieval strategies are not an alternative to fine-tuning and can in fact be complementary to it.

You have two options:

Custom solution (see my past answer).
Using LlamaIndex or LangChain.

OpenAI ChatGPT (GPT-3.5) API: How do I make a fine-tuned GPT-3.5 model only answer from the fine-tuned data?

1 Answers1

When should I use fine-tuning vs embeddings with retrieval?

Linked

Related