0

I'm building a FastAPI endpoint which should stream the ChatCompletion of GPT3.5 from the openAI python library. Here is my code :

@app.post("/ai_re/")
async def read_item(request: Request):
    base_prompt = "This is a prompt"
    sources = []
    response = await asyncify(openai.ChatCompletion.create)(
        model = 'gpt-3.5-turbo',
        messages = [{"role": "system", "content": base_prompt.strip()}],
        max_tokens = 550,
        temperature = 0.28,
        stream = True,
        n = 1
    )

    async def event_generator():
        for event in response:
            event_text = event.choices[0].delta.content if "content" in event.choices[0].delta else ""
            event_data = {
                "texte": event_text,
                "source": sources
            }
            yield f"data: {json.dumps(event_data)}\n\n"

    return StreamingResponse(event_generator(), media_type="text/event-stream")

I use Asyncify to make the request async — as a simple post endpoint (no sse) this works well and doesn't block the main thread.

But somehow in this configuration, the streaming works, but all the other endpoints are blocked until this request succeeds.

I tried :

  • removing asyncify
  • removing asyncs awaits at some random places but then I often get the coroutine object is not an iterator
  • https://stackoverflow.com/questions/75740652/fastapi-streamingresponse-not-streaming-with-generator-function – GooJ May 15 '23 at 20:38
  • @GooJ thanks, but i've already seen that post. My endpoint works — the problem is it occupies the main thread. – Baudouin Arbarétier May 15 '23 at 20:45
  • @Chris Sorry if it wasn't super clear, if you copy the working example, you will see there is no async infront of the event_generator - follow some of the links mentioned in that answer and you will learn what async actually does - it doesn't make things run in parallel as many (I use to as well) think it does from the name – GooJ May 16 '23 at 21:30
  • I disagree Chris, this case is different than the ones linked previously. I'll be able to mark this post as solved in 11 hours "You can accept your own answer in 11 hours" — my solution was posted below – Baudouin Arbarétier May 17 '23 at 07:56
  • @BaudouinArbarétier Please have a closer look at the answers and the references provided by them. In short, if you are using a `def` generator, FastAPI will use [`iterate_in_threadpool()`](https://github.com/encode/starlette/blob/31164e346b9bd1ce17d968e1301c3bb2c23bb418/starlette/responses.py#L238) to run it in a separate thread and `await` it; whereas, if you provide an `async def` generator, it is run in the event loop (hence, performing blocking operations inside it would result in blocking the event loop). It is the same concept that is explained in detail in the link provided above. – Chris May 17 '23 at 18:11
  • @Chris Yh I meant to tag OP, my bad... – GooJ May 17 '23 at 20:00

1 Answers1

0

Somehow removing the async before the event generator worked out.