I used the code below to load data from Snowflake to Pandas dataframe for a long time, until I updated snowflake-connector-python==2.7.0 and pyarrow==5.0.0
ctx = connector.connect(
user=user,
password=pwd,
account="***.eu-central-1",
warehouse="***",
database="***",
)
cur = ctx.cursor()
cur.execute(data_sql)
# Issue occurs here
long_data_df = cur.fetch_pandas_all()
Everything worked as expected, but after the update to mentioned versions, dataframe is generated with non-unique dataframe index.
index | colA |
---|---|
0 | val1 |
0 | val2 |
0 | val3 |
1 | val4 |
with pyarrow==3.0.0 and snowflake-connector-python==2.4.6 (I am not sure in what version this change/bug occured), the dataframe looked like below
index | colA |
---|---|
1 | val1 |
2 | val2 |
3 | val3 |
4 | val4 |
The trouble occurs when you try to do pd.concat dataframe with non-unique index, it fails with pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects