1

I used the code below to load data from Snowflake to Pandas dataframe for a long time, until I updated snowflake-connector-python==2.7.0 and pyarrow==5.0.0

    ctx = connector.connect(
        user=user,
        password=pwd,
        account="***.eu-central-1",
        warehouse="***",
        database="***",
    )

    cur = ctx.cursor()

    cur.execute(data_sql)

    # Issue occurs here
    long_data_df = cur.fetch_pandas_all()

Everything worked as expected, but after the update to mentioned versions, dataframe is generated with non-unique dataframe index.

index colA
0 val1
0 val2
0 val3
1 val4

with pyarrow==3.0.0 and snowflake-connector-python==2.4.6 (I am not sure in what version this change/bug occured), the dataframe looked like below

index colA
1 val1
2 val2
3 val3
4 val4

The trouble occurs when you try to do pd.concat dataframe with non-unique index, it fails with pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

jhettler
  • 51
  • 3
  • Looks like a bug in the Snowflake connector connector which has hopefully been fixed in this PR (yet to be included in new release though). https://github.com/snowflakedb/snowflake-connector-python/pull/1068 – nedned Mar 14 '22 at 07:09

1 Answers1

0

We didn't want to downgrade pyarrow and snowflake-connector-python packages, we solved it by reseting index in pandas dataframe by adding

long_data_df = long_data_df.reset_index(drop=True)
Dharman
  • 30,962
  • 25
  • 85
  • 135
jhettler
  • 51
  • 3