I came across a situation where we need to use a plain gRPC client (through the grpc.aio
API) to talk to an Arrow Flight gRPC server.
The DoGet
call did make it to the server, and we have received a FlightData
in response. If our understanding of the Flight gRPC
definition is correct, the response contains a flatbuffers
message that can somehow be decoded into a RecordBatch
.
Following, is the client-side code,
import asyncio
import pathlib
import grpc
import pyarrow as pa
import pyarrow.flight as pf
import flight_pb2, flight_pb2_grpc
async def main():
ticket = pf.Ticket("tick")
sock_file = pathlib.Path.cwd().joinpath("arena.sock").resolve()
async with grpc.aio.insecure_channel(f"unix://{sock_file}") as channel:
stub = flight_pb2_grpc.FlightServiceStub(channel)
async for data in stub.DoGet(flight_pb2.Ticket(ticket=ticket.ticket)):
assert type(data) is flight_pb2.FlightData
print(data)
# How to convert data into a RecordBatch?
asyncio.run(main())
Currently we stuck on this last step of decoding the FlightData
response.
The question is two fold,
- are there some existing facilities form
pyarrow.flight
that we can use to decode a pythongrpc
object of theFlightData
type; - if #1 is not possible, what are some other options to decode the content of the
FlightData
and reconstruct aRecordBatch
from scratch?
The main interest here is to use the AsyncIO of plain gRPC client. Supposedly, this is not feasible with the current version of Arrow Flight gRPC client.