I have a SQL table containing huge data, need to train the SQL table data to ChatGPT using Chat Completion API.
I tried of generating a SQL query using ChatGPT, but that doesn't work as expected. Sometimes it generates inappropriate query.
I have a SQL table containing huge data, need to train the SQL table data to ChatGPT using Chat Completion API.
I tried of generating a SQL query using ChatGPT, but that doesn't work as expected. Sometimes it generates inappropriate query.
UPDATE: 22 August 2023
Fine-tuning for GPT-3.5 Turbo is now available, as stated in the official OpenAI blog:
Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. This update gives developers the ability to customize models that perform better for their use cases and run these custom models at scale. Early tests have shown a fine-tuned version of GPT-3.5 Turbo can match, or even outperform, base GPT-4-level capabilities on certain narrow tasks. As with all our APIs, data sent in and out of the fine-tuning API is owned by the customer and is not used by OpenAI, or any other organization, to train other models.
Also, two new models (i.e., davinci-002
and babbage-002
) were introduced as replacements for GPT-3 models (davinci
, curie
, babbage
, and ada
) as those GPT-3 models will be turned off on January 4th, 2024. Consequently, if you take a look at the official OpenAI documentation now, you'll see the following:
What models can be fine-tuned?
Fine-tuning is currently available for the following models:
gpt-3.5-turbo-0613
(recommended)babbage-002
davinci-002
We expect
gpt-3.5-turbo
to be the right model for most users in terms of results and ease of use, unless you are migrating a legacy fine-tuned model.
Note: GPT-3 models were already removed from the list, but they should still be available for fine-tuning until January 4th, 2024.
You can't fine-tune the gpt-3.5-turbo
model.
As stated in the official OpenAI documentation:
What models can be fine-tuned?
Fine-tuning is currently only available for the following base models:
davinci
,curie
,babbage
, andada
. These are the original models that do not have any instruction following training (liketext-davinci-003
does for example). You are also able to continue fine-tuning a fine-tuned model to add additional data without having to start from scratch.
gpt-3.5 models are not available for fine-tuning right now. But gpt3 models can be fine-tuned.
Fine-tuning is currently only available for the following base models: davinci, curie, babbage, and ada. These are the original models that do not have any instruction following training (like text-davinci-003 does for example).
Please follow the documentation for details. https://platform.openai.com/docs/guides/fine-tuning
As of August 22nd 2023, you can now fine-tune all GPT-3.5 Turbo models: https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates
The other answers of this question are now outdated in this sense.
Your use case for fine-tuning to inadvertently ‘train’ the model on your database schema via examples does seem like a valid fine-tuning use case.
However, worth pointing out: Fine-tuning is NOT the mechanism for training the model on large datasets to improve its knowledge base / ‘pre-trained data’ - to handle those types of use cases I would suggest to instead utilize a vector database to store that contextual data (can use Embedding model to convert text/documents to vectors), utilize a retriever mechanism (eg LangChain) to retrieve relevant context for a given query, and then finally include that retrieved context along with the original query in your prompt sent to chat/completion API of LLM.