0

I have the following pandas dataframe:

import pandas as pd
df = pd.DataFrame({"id": [1,2,3], "items": [('a', 'b'), ('a', 'b', 'c'), tuple('d')]}

>print(df)
   id      items
0   1     (a, b)
1   2  (a, b, c)
2   3       (d,)

After registering my GCP/BQ credentials in the normal way...

    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_my_creds.json"

... I try to export it to a BQ table:

import pandas_gbq
pandas_gbq.to_gbq(df, "my_table_name", if_exists="replace")

but I keep getting the following error:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1205, in to_gbq
...
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 342, in bq_to_arrow_array
    return pyarrow.Array.from_pandas(series, type=arrow_type)
  File "pyarrow/array.pxi", line 915, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object

I have tried converting the tuple column to string with df = df.astype({"items":str}) and adding a table_schema param to the pandas_gbq.to_gbq... line but I keep getting this same error.

I have also tried replacing the pandas_gbq.to_gbq... line with the bq_client.load_table_from_dataframe method described here but still get the same pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object error...

Max Power
  • 8,265
  • 13
  • 50
  • 91

1 Answers1

0

So I think this is a weird issue with pandas dtypes being separate from Python types, and the astype only converting the type and not the pandas dtype. Try also converting the dtype to match the type after the astype statement.

Such that.

df = df.astype({"items": str})

Is replaced with:

df = df.astype({"items": str})
df = df.convert_dtypes()

Let me know if this works.

Jeremy Savage
  • 944
  • 5
  • 14