Comparing two dictionaries of pandas df returns errors when they are identical

Question

I have a dictionary of pandas dfs which I convert to a pickle file like this:

with open('performance.pkl', 'wb') as handle:
    pickle.dump(performance, handle, protocol=pickle.HIGHEST_PROTOCOL)

Then I load the pickle file like this:

with open('performance.pkl', 'rb') as handle:
    a = pickle.load(handle)

When I inspect the contents of the dictionaries "performance" and "a", they are identical, however, if I do:

a == performance

I get:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Furthermore:

a.keys() == performance.keys()
True

a.values == performance.values()
False

(type(a), type(performance) 
(dict, dict)

Also, when lopping and comparing the DFs inside "a" and the DFs inside "performance" one by one, they are identical.

Since I am comparing python dictionaries, I am not sure what the problem is. I would not like to loop over the DFs inside "a" and "performance" one by one, since the there are many indices inside each and it takes time.

Btw, I don't need to necessarily save as pickle, but any other format that allows me to save the dictionary.

When you compare dictionaries, it will still compare any object stored inside the dict. This could help https://stackoverflow.com/questions/43504568/compare-dictionaries-with-unhashable-or-uncomparable-values-e-g-lists-or-data — micric, Sep 16 '19 at 13:47
What is `performance` and what is `a`? Are they dictionaries or dataframes? — Quang Hoang, Sep 16 '19 at 13:53
@QuangHoang. Yes. As the title of the question says "Comparing two dictionaries of pandas DFs..." they are dictionaries containing pandas DFs..done like this for very specific reasons. — Luis Miguel, Sep 16 '19 at 14:12
Try to print `a` to see what you get, because the error says you are comparing dataframes. Is `a` a dictionary of dataframes? — Quang Hoang, Sep 16 '19 at 14:13
Yes. A dictionary of DFs. Printing shows exactly what I say in the title of the question: the indices of the dictionaries and the corresponding DF for each index. — Luis Miguel, Sep 16 '19 at 14:18
In which case, `df1 == df2` would yield the said errors. You need to do a loop: `for x in a: a[x].eq(performance[x]).all(None)`. — Quang Hoang, Sep 16 '19 at 14:49
That would be equal to comparing df by df, which defeats the purpose of asking for a global comparison of 2 dictionaries. The indices are in the millions, so df by df comparison is not effective. — Luis Miguel, Sep 16 '19 at 15:05

Madhur Yadav · Answer 1 · 2019-09-16T16:53:30.443

0

Try

a.equals(performance)

refereneces - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.equals.html

Editing the answer -

for key1 in a.keys():
    if(a[key1].equals(performance[key1]):
        print(True)
    else:
        print(False)

edited Sep 16 '19 at 16:53

answered Sep 16 '19 at 13:21

Madhur Yadav

635
1
11
30

AttributeError: 'dict' object has no attribute 'equals'. This is not about comparing DFs but comparing dictionaries containing DFs. – Luis Miguel Sep 16 '19 at 13:22

Comparing two dictionaries of pandas df returns errors when they are identical

1 Answers1