0

Setting up Server-Sent events is relatively simple - especially using FastAPI - you can do something like this:

def fake_data_streamer():
for i in range(10):
    yield "some streamed data"
    time.sleep(0.5)

@app.get('/')
async def main():
    return StreamingResponse(fake_data_streamer())

And upon an HTTP GET, the connection will return "some streamed data" every 0.5 seconds.

What if I wanted to stream some structured data though, like JSON?

For example, I want the data to be JSON, so something like:

def fake_data_streamer():
    for i in range(10):
        yield json.dumps({'result': 'a lot of streamed data', "seriously": ["so", "much", "data"]}, indent=4)
        time.sleep(0.5)

Is this a proper server side implementation? Does this pose a risk of the client receiving partially formed payloads - which is fine in the plaintext case, but makes the JSON un-parseable?

If this is okay, how would you read this from the client side? Something like this:

async def main():
async with aiohttp.ClientSession() as session:
    async with session.get(url) as resp:
        while True:
            chunk = await resp.content.readuntil(b"\n")
            await asyncio.sleep(1)
            if not chunk:
                break
            print(chunk)

Although I am not sure what the proper separator / reading mode is appropriate to ensure that the client always receives fully formed JSON events.

Or, is this a completely improper way to accomplish streaming of fully formed JSON events.

For reference, OpenAI achieves streaming of complete JSON objects in their API: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb

  • Not sure about this, but I asked OpenAI to always send me a response in markdown, this way the UI can tell that in next few lines, there will be JSON inside a fenced code block. Even if the response is mixed content, UI still can tell because that fenced code block. – Sandi Jun 16 '23 at 02:40
  • Thanks for that, I think I found out the answer - there is a data content type called "json newline delimited" - so basically you just read the stream and wait for the next new-line, which guarantees that you receive valid JSON objects. I will post an answer + my reference in the next few hours. – GenericDeveloperProfile Jun 16 '23 at 02:49
  • Did not know that! Thanks! – Sandi Jun 16 '23 at 03:02
  • You might find [this answer](https://stackoverflow.com/a/75760884/17865804), as well as [this answer](https://stackoverflow.com/a/75837557/17865804) and [this answer](https://stackoverflow.com/a/76122475/17865804) helpful – Chris Aug 03 '23 at 03:40

1 Answers1

1

To clarify and provide some context, OpenAI's Client API post processes Server-sent events to make them into nice JSON. But the "raw" events are sent per data: only spec for Server Sent events, which is here: https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events. There are links in the ipython notebook you mentioned that point to this spec and other resources in the notebooks introduction.

You can query the open AI end point to see raw Server-events yourself:

https://asciinema.org/a/TKDYxqh6pgCNN0hX6tdcSgfzU

I am using the curl command:

curl https://api.openai.com/v1/chat/completions -u :$OPENAI_API_KEY  -H 'Content-Type: application/json' -d '{"stream": true,"model": "gpt-3.5-turbo-0613", "messages": [ {"role": "user", "content": "Summarize Othello by Shaekspeare in a one line"}]}'

Now to answer your question. There is nothing preventing you from doing what you are intending, the sample code in the spec says you can do this very simply by placing a "\n\n" at the end of your data: event and have a client parser parse this(After it sees the relevant header "Content-Type: text/event-stream"). Requests library in python actually does this automatically for you using the response_obj.iter_lines() function. Python also has a SSE Client library that is easy to read that can parse this event and give you the full data: lines.

Sid
  • 420
  • 1
  • 6
  • 11