I'm running the following code
import pyarrow
import pyarrow.parquet as pq
import pandas as pd
import json
parquet_schema = schema = pyarrow.schema(
[('id', pyarrow.string()),
('firstname', pyarrow.string()),
('lastname', pyarrow.string())])
user_json = '{"id" : "id1", "firstname": "John", "lastname":"Doe"}'
writer = pq.ParquetWriter('user.parquet', schema=parquet_schema)
df = pd.DataFrame.from_dict(json.loads(user_json))
table = pyarrow.Table.from_pandas(df)
print(table.schema)
writer.write_table(table)
writer.close()
but I"m getting the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-a427a4cdd392> in <module>()
15 writer = pq.ParquetWriter('user.parquet', schema=parquet_schema)
16
---> 17 df = pd.DataFrame.from_dict(json.loads(user_json))
18 table = pyarrow.Table.from_pandas(df)
19 print(table.schema)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in extract_index(data)
385
386 if not indexes and not raw_lengths:
--> 387 raise ValueError("If using all scalar values, you must pass an index")
388
389 if have_series:
ValueError: If using all scalar values, you must pass an index
Followed docs and tutorials, but I"m missing something.