I am trying to compare a Table in Snowflake database with it's exported CSV file, which fails due to numeric columns in CSV exponential and datetime with milliseconds
Converted both the table read from snowflake and CSV to data frame and comparing.
sel_query = "SELECT * FROM X_TABLE"
cur.execute(sel_query)
# making dataframe from ouput of sel_query
df1 = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
# making datafrmae from csv file
df2 = pd.read_csv("csv_file.csv", encoding="utf-8", sep=',')
dfBool = (df1 != df2).stack()
dfdiff = pd.concat([df1.stack()[dfBool], df2.stack()[dfBool]], axis=1)
dfdiff.columns = ["Snowflake data", "CSV data"]
print(dfdiff)
df1.equals(df2) should return True and dfdiff should be empty
Instead the ouput is as below for dfdiff:
Snowflake data CSV data
0 TENURE 452 452
NUMBER 40379813 4.03798e+07
TIME 2018-04-09 11:34:49 2018-04-09 11:34:49.000
1 TENURE 364 364
NUMBER 40377580 4.03776e+07
TIME 2018-04-09 09:11:29 2018-04-09 09:11:29.000