1

I am trying to compare a Table in Snowflake database with it's exported CSV file, which fails due to numeric columns in CSV exponential and datetime with milliseconds

Converted both the table read from snowflake and CSV to data frame and comparing.

sel_query = "SELECT * FROM X_TABLE"
cur.execute(sel_query)

# making dataframe from ouput of sel_query
df1 = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
# making datafrmae from csv file
df2 = pd.read_csv("csv_file.csv", encoding="utf-8", sep=',')

dfBool = (df1 != df2).stack()
dfdiff = pd.concat([df1.stack()[dfBool], df2.stack()[dfBool]], axis=1)
dfdiff.columns = ["Snowflake data", "CSV data"]

print(dfdiff)

df1.equals(df2) should return True and dfdiff should be empty

Instead the ouput is as below for dfdiff:

                                Snowflake data               CSV data
0   TENURE                               452                      452
    NUMBER                             40379813              4.03798e+07
    TIME                    2018-04-09 11:34:49      2018-04-09 11:34:49.000
1   TENURE                               364                      364
    NUMBER                             40377580              4.03776e+07
    TIME                    2018-04-09 09:11:29      2018-04-09 09:11:29.000
  • dfBool = (df1 != df2).stack() -> are there other ways to compare the data to get the true/false answer you are looking for? Check out https://stackoverflow.com/questions/19917545/comparing-two-pandas-dataframes-for-differences – Rachel McGuigan Oct 21 '19 at 18:42

0 Answers0