0

Just running a simple for-loop on a list of dataframes, however trying to add an IF clause... and it keeps erroring out.

df_list = [df1, df2, df3]
for df in df_list:
   if df in [df1, df2]:
      x = 1
   else:
      x = 2
.
.
.
ValueError: Can only compare identically-labeled DataFrame objects

Above is a simplified version of what I'm attempting. Can anyone tell me why this isn't working and a fix?

chicagobeast12
  • 643
  • 1
  • 5
  • 20

5 Answers5

3

You could use DataFrame.equals with any instead:

df_list = [df1, df2, df3]
for df in df_list:
    if any(df.equals(y) for y in [df1, df2]):
        x = 1
    else:
        x = 2
2

Do NOT use .equals() here!

It's unnecessary and slowing down you program, use id() instead:

df_list = [df1, df2, df3]
for df in df_list:
   if id(df) in [id(df1), id(df2)]: 
      x = 1
   else:
      x = 2

Because here you just need to compare the identities, rather than the values.

Panwen Wang
  • 3,573
  • 1
  • 18
  • 39
  • Depending upon the use case `id` can be nice, but it could also be unwanted. For instance if `df3` was created with `df3=df1`, then they share the same `id`, yet for _some reason_ perhaps they should be handled differently. Guess that could be avoided with `df3=df1.copy()` so it's truly a different object, not just a reference – ALollz Apr 01 '22 at 18:12
  • If `df3=df1` is in the case, then @ALollz 's answer will be your choice. Neither `id()` nor `.equals()` can distinguish them. But `id()` is able to tell `df3` from `df1` if `df3` is a copy of `df1`, while `.equals()` is not. – Panwen Wang Apr 01 '22 at 19:07
1

You could use a better container and reference them by labels.

Equality checks for large DataFrames with object types can become slow, >> seconds, but it will take ~ns to check if the label is in a list.

dfs = {'df1': df1, 'df2': df2, 'df3': df3}
for label, df in dfs.items():
    if label in ['df1', 'df2']:
        x = 1
    else:
        x = 2
ALollz
  • 57,915
  • 7
  • 66
  • 89
0

You need to use df.equals()

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html

df_list = [df1, df2, df3]
for df in df_list:
   if df.equals(df1) or df.equals(df2):
      # blah blah
mortice
  • 84
  • 4
0

The following link might help: Pandas "Can only compare identically-labeled DataFrame objects" error

According to this, the data frames being compared with == should have the same columns and index otherwise it gives the error.

Alternatively, you can compare the data frames using dataframe.equals method. Please refer to the documentation below: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html