I'm trying to format the data to feed into an API. The requested body asking for
{ userEvent: { eventType: "home-page-view", visitorId: "visitor-1", userInfo: { userId: "user-1" } } }
{ userEvent: { eventType: "home-page-view", visitorId: "visitor-2", userInfo: { userId: "user-2" } } }
{ userEvent: { eventType: "home-page-view", visitorId: "visitor-3", userInfo: { userId: "user-3" } } }
...
I got a query to get the required columns in BigQuery and then push the table result to GCS in the NDJSON format. However, I have no clue how to add a key userEvent
to include the currently available results (1) during the query or (2) after storing the NDJSON file in GCS. Given that the expected number of rows will be roughly one million, I wonder what the most efficient solution to this as I'd like to avoid a for loop to go through each line...
This is what I have now:
# get the BigQuery result and store in the "results" variable
query_job = bq_client.query(query, job_config=job_config, location="US")
results = query_job.result() #<google.cloud.bigquery.table.RowIterator at 0x148728310>
# The current NDJSON file in GCS
{"eventType":"home-page-view","visitorId":"13245","userInfo":{"userId":"11111"}
{"eventType":"home-page-view","visitorId":"56789","userInfo":{"userId":"22222"}