Summary:
1) How to write a Pandas data frame into GCS(Google cloud storage) within a Jupyter Notebook(like AI Notebook)
2) In the same notebook, how to call that object to be uploaded into a new data set in Bigquery
Problem
I do have an object that is big enough to make unfeasible to download it locally and then write it on GCS -> BQ. However, the object is not big enough to be processed using Apache-Beam. I brought into the notebook using BQ magic. After making some transformations, I want to send an object back towards my data repositories. Therefore, I am trying to use AVRO to copy it but I can not figure out how to make it work. I have tried following this guide(https://github.com/ynqa/pandavro), but I have not figured yet how the function should be spelt.
I´m doing this:
OUTPUT_PATH='{}/resumen2008a2019.avro'.format('gcs://xxxx')
pdx.to_avro(OUTPUT_PATH,df4)
That is returning me the following error: FileNotFoundError: [Errno 2] No such file or directory: 'gcs://xxxx'
Why not Parquet? It is not being able to transform the data correctly into a JSON: ArrowInvalid: ('Could not convert with type str: tried to convert to double', 'Conversion failed for column salario with type object')
Why not directly? I tried using this post as guide (Write a Pandas DataFrame to Google Cloud Storage or BigQuery). But it is three year old, and many of the stuff does not work like that anymore.
Should I surrender, and just write a classic ol´ csv?