Let's say generically my setup is like this:
from fastapi import FastAPI, Response
import pyarrow as pa
import pyarrow.ipc as ipc
app = FastAPI()
@app.get("/api/getdata")
async def getdata():
table = pa.Table.from_pydict({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 22]})
### Not really sure what goes here
## something like this...
sink = io.BytesIO()
with ipc.new_file(sink, table.schema) as writer:
for batch in table.to_batches():
writer.write(batch)
sink.seek(0)
return StreamingResponse(content=sink, media_type="application/vnd.apache.arrow.file")
This works but I'm copying the whole table to BytesIO first? It seems like what I need to do is make a generator that yields whatever writer.write(batch)
writes to the Buffer instead of actually writing it but I don't know how to do that. I tried using the pa.BufferOutputStream
instead of BytesIO but I can't put that in as a return object for fastapi.
My goal is to be able to get the data on the js side like this...
import { tableFromIPC } from "apache-arrow";
const table = await tableFromIPC(fetch("/api/getdata"));
console.table([...table]);
In my approach, this works, I'd just like to know if there's a way to do this without the copying.