11

I'm trying OpenAI.

I have prepared the training data, and used fine_tunes.create. Several minutes later, it showed Stream interrupted (client disconnected).

$ openai api fine_tunes.create -t data_prepared.jsonl
Upload progress: 100%|██████████████████████████████████████████████| 47.2k/47.2k [00:00<00:00, 44.3Mit/s]
Uploaded file from data_prepared.jsonl: file-r6dbTH7rVsp6jJMgbX0L0bZx
Created fine-tune: ft-JRGzkYfXm7wnScUxRSBA2M2h
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-12-02 11:10:08] Created fine-tune: ft-JRGzkYfXm7wnScUxRSBA2M2h
[2022-12-02 11:10:23] Fine-tune costs $0.06
[2022-12-02 11:10:24] Fine-tune enqueued. Queue number: 11

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-JRGzkYfXm7wnScUxRSBA2M2h

I tried fine_tunes.follow, several minutes later, it still failed:

$ openai api fine_tunes.follow -i ft-JRGzkYfXm7wnScUxRSBA2M2h
[2022-12-02 11:10:08] Created fine-tune: ft-JRGzkYfXm7wnScUxRSBA2M2h
[2022-12-02 11:10:23] Fine-tune costs $0.06
[2022-12-02 11:10:24] Fine-tune enqueued. Queue number: 11

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-JRGzkYfXm7wnScUxRSBA2M2h

openai api fine_tunes.list showed:

$ openai api fine_tunes.list
{
  "data": [
    {
      "created_at": 1669975808,
      "fine_tuned_model": null,
      "hyperparams": {
        "batch_size": 2,
        "learning_rate_multiplier": 0.1,
        "n_epochs": 4,
        "prompt_loss_weight": 0.01
      },
      "id": "ft-JRGzkYfXm7wnScUxRSBA2M2h",
      "model": "curie",
      "object": "fine-tune",
      "organization_id": "org-YyoQqNIrjGHYDnKt9t3T6x2J",
      "result_files": [],
      "status": "pending",
      "training_files": [
        {
          "bytes": 47174,
          "created_at": 1669975808,
          "filename": "data_prepared.jsonl",
          "id": "file-r6dbTH7rVsp6jJMgbX0L0bZx",
          "object": "file",
          "purpose": "fine-tune",
          "status": "processed",
          "status_details": null
        }
      ],
      "updated_at": 1669975824,
      "validation_files": []
    }
  ],
  "object": "list"
}

And $ openai api completions.create -m ft-JRGzkYfXm7wnScUxRSBA2M2h -p aprompt returned Error: That model does not exist (HTTP status code: 404).

Could anyone help?

SoftTimur
  • 5,630
  • 38
  • 140
  • 292
  • 1
    Experiencing same issue. Could it be that queues are overcrowded, thus leading queue time to exceed some timeout? – user2398029 Dec 06 '22 at 00:35

5 Answers5

8

Apparently, there is a problem with the OpenAI API. (see: this reddit post and this issue on Git) I downgraded my version to v.0.25 by running
pip install openai==0.25.0
That fixed it for me. Although, to be fair, you can expect this to be fixed in the future.

Raste
  • 376
  • 1
  • 13
  • 2
    Is this supposed to be fixed yet, I am continually getting this issue. – MattG Mar 13 '23 at 12:23
  • 2
    As of today, it does not seem to be fixed yet, although I didd't do any more finetunes to test it. Until it is fixed, I recommend sticking with version 0.25 - at least for fine tuning. After you finished your fine tuning, you can revert and use your fine-tuned model with the current version. – Raste Mar 15 '23 at 10:23
  • After about a minute, the fine tuning disconnects. Even on version 0.25. Seems like the issue is server side – Antoine Neidecker Mar 23 '23 at 08:58
  • I don't think you need to stay connected for your fine-tuning. Once you've uploaded the files, OpenAI will do the rest for you. If I remember correctly, there is a command to reconnect to the output stream of your fine-tuning as well. Once your fine-tuning is finished, you can use the CLI to view your fine-tuned models. Another way to see it is to log into the OpenAI Playground and choose your fine-tuned model on the right side. – Raste Apr 06 '23 at 12:34
2

It was a temporary issue of OpenAI, the team fixed that.

SoftTimur
  • 5,630
  • 38
  • 140
  • 292
2

The good news is that stream interruption is only preventing you from viewing progress not actually fine-tuning. Your job is already in the queue.

The bad news is that you never know how long will it take. According to https://platform.openai.com/docs/guides/fine-tuning

Streams events until the job is done (this often takes minutes, but can take hours if there are many jobs in the queue or your dataset is large)

You may periodically check the status of your job using:

openai api fine_tunes.list

More info:

https://community.openai.com/t/stream-interrupted-client-disconnected-during-fine-tunes-follow/70334/20

Samuil Banti
  • 1,735
  • 1
  • 15
  • 26
1

I received the same error and end up creating a python program to monitor the progress without running the commands manually. Simply change the ID and you are all set to go.

import subprocess
import time
import threading
import json
import requests

class FineTuneMonitor:
    def __init__(self):
        self.cmd1 = "openai api fine_tunes.follow -i ft-YOUR_ID_TO_MONITOR"
        self.cmd2 = "openai api fine_tunes.list"
        self.id = "**YOUR_ID_TO_MONITOR**"  # Set the ID of your fine-tune process
        self.process = None
        self.success = False

    def run_command(self):
        self.process = subprocess.Popen(self.cmd1, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)

        while True:
            output = self.process.stdout.readline().decode()
            print(output.strip())

            if self.process.poll() is not None:
                break

            if 'Stream interrupted' in output:
                self.process.kill()
                return False

        return True

    def monitor_status(self):
        while True:
            output = subprocess.check_output(self.cmd2, shell=True).decode()
            data = json.loads(output)
            status = next((item["status"] for item in data["data"] if item["id"] == self.id), None)

            if status is None:
                print("Could not find status for the given ID.")
                return

            print(f"Status: {status}")

            if status != "pending":
                if self.process is not None:
                    self.process.kill()
                self.success = True
                return

            time.sleep(10)

    def start(self):
        threading.Thread(target=self.monitor_status).start()

        while not self.success:
            self.run_command()
            print("Stream was interrupted, retrying in 10 seconds...")
            time.sleep(10)

monitor = FineTuneMonitor()
monitor.start()

This script creates a FineTuneMonitor class that encapsulates the logic for running and monitoring OpenAI commands. The start method starts the status monitoring in a new thread and then repeatedly runs the first command until it completes successfully. The monitor_status method runs the second command every 10 seconds and kills the first command's process if the status changes from "pending".

Replace self.id with the ID of your fine-tune process. The script assumes that the status is located under data -> status in the output of the second command. If the structure of the output changes, you'll need to update the script accordingly.

PanDe
  • 831
  • 10
  • 21
0

I kept ruining this command: openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>

until it got a Queue number

Ali
  • 1,633
  • 7
  • 35
  • 58