0

I have a dictionary of pandas dfs which I convert to a pickle file like this:

with open('performance.pkl', 'wb') as handle:
    pickle.dump(performance, handle, protocol=pickle.HIGHEST_PROTOCOL)

Then I load the pickle file like this:

with open('performance.pkl', 'rb') as handle:
    a = pickle.load(handle)

When I inspect the contents of the dictionaries "performance" and "a", they are identical, however, if I do:

a == performance

I get:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Furthermore:

a.keys() == performance.keys()
True

a.values == performance.values()
False

(type(a), type(performance) 
(dict, dict)

Also, when lopping and comparing the DFs inside "a" and the DFs inside "performance" one by one, they are identical.

Since I am comparing python dictionaries, I am not sure what the problem is. I would not like to loop over the DFs inside "a" and "performance" one by one, since the there are many indices inside each and it takes time.

Btw, I don't need to necessarily save as pickle, but any other format that allows me to save the dictionary.

Luis Miguel
  • 5,057
  • 8
  • 42
  • 75
  • When you compare dictionaries, it will still compare any object stored inside the dict. This could help https://stackoverflow.com/questions/43504568/compare-dictionaries-with-unhashable-or-uncomparable-values-e-g-lists-or-data – micric Sep 16 '19 at 13:47
  • What is `performance` and what is `a`? Are they dictionaries or dataframes? – Quang Hoang Sep 16 '19 at 13:53
  • @QuangHoang. Yes. As the title of the question says "Comparing two dictionaries of pandas DFs..." they are dictionaries containing pandas DFs..done like this for very specific reasons. – Luis Miguel Sep 16 '19 at 14:12
  • Try to print `a` to see what you get, because the error says you are comparing dataframes. Is `a` a dictionary of dataframes? – Quang Hoang Sep 16 '19 at 14:13
  • Yes. A dictionary of DFs. Printing shows exactly what I say in the title of the question: the indices of the dictionaries and the corresponding DF for each index. – Luis Miguel Sep 16 '19 at 14:18
  • 1
    In which case, `df1 == df2` would yield the said errors. You need to do a loop: `for x in a: a[x].eq(performance[x]).all(None)`. – Quang Hoang Sep 16 '19 at 14:49
  • That would be equal to comparing df by df, which defeats the purpose of asking for a global comparison of 2 dictionaries. The indices are in the millions, so df by df comparison is not effective. – Luis Miguel Sep 16 '19 at 15:05

1 Answers1

0

Try

a.equals(performance)

refereneces - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.equals.html

Editing the answer -

for key1 in a.keys():
    if(a[key1].equals(performance[key1]):
        print(True)
    else:
        print(False)
Madhur Yadav
  • 635
  • 1
  • 11
  • 30
  • AttributeError: 'dict' object has no attribute 'equals'. This is not about comparing DFs but comparing dictionaries containing DFs. – Luis Miguel Sep 16 '19 at 13:22