0

I have a DataFrame with a column that contains of dictionaries. My task is to compare first two values inside dict and if they are equal then I want to collect entire row. I can not show any code of mine because I really don't know how to organize this. But I am going to create a small example of my DF to make the situation more clear.

import pandas as pd
test = pd.DataFrame({'one':['hello', 'there', 'every', 'body'],
       'two': ['a', 'b', 'c', 'd'],
       'dict': [{'composition': 12, 'process': 4, 'pathology': 4},
                {'food': 9, 'composition': 9, 'process': 6, 'other_meds': 3},
                {'process': 2},
                {'composition': 6, 'other_meds': 6, 'pathology': 2, 'process': 1}]})
test

So the data looks like this:

    one    two  dict
0   hello   a   {'composition': 12, 'process': 4, 'pathology': 4}
1   there   b   {'food': 9, 'composition': 9, 'process': 6, 'other_meds': 3}
2   every   c   {'process': 2}
3   body    d   {'composition': 6, 'other_meds': 6, 'pathology': 2, 'process': 1}

My target is to collect to a new DataFrame rows with index 1 and 3 because two first values of a dict are the same 'food': 9, 'composition': 9 and 'composition': 6, 'other_meds': 6. Row with index number 0 is having same values but it is not interesting because they are not in first and second position.

I know that we are using loc and iloc to collect the rows. But how to assign the condition for dictionary I don't know. Please help!

Alona
  • 67
  • 1
  • 10
  • Is not reliable to trust in the order of the dictionaries... `{'composition': 6, 'other_meds': 6, 'pathology': 2, 'process': 1} == {'composition': 6, 'pathology': 2, 'process': 1, 'other_meds': 6}` for example – Dani Mesejo Dec 03 '20 at 20:29
  • @Dani Mesejo my values are sorted in advance! – Alona Dec 03 '20 at 20:32

3 Answers3

1

You could do:

import pandas as pd

test = pd.DataFrame({'one': ['hello', 'there', 'every', 'body'],
                     'two': ['a', 'b', 'c', 'd'],
                     'dict': [{'composition': 12, 'process': 4, 'pathology': 4},
                              {'food': 9, 'composition': 9, 'process': 6, 'other_meds': 3},
                              {'process': 2},
                              {'composition': 6, 'other_meds': 6, 'pathology': 2, 'process': 1}]})


def equal_values(d):
    try:
        # extract first and second value
        first, second, *_ = d.values()
        return first == second
    except ValueError:
        return False  # if there are not two values


res = test[test['dict'].apply(equal_values)]
print(res)

Output

     one two                                               dict
1  there   b  {'food': 9, 'composition': 9, 'process': 6, 'o...
3   body   d  {'composition': 6, 'other_meds': 6, 'pathology...

The notation:

first, second, *_ = d.values()

Is known as extended iterable unpacking, see this answer for a broad explanation and this post for an entry-level tutorial.

It the particular case above it means take the first, and the second ignore the remaining ones (*_) from values.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • thank you for solution! But can you please expand your answer with better explanation of ```first, second, *_ = d.values()``` and especially what is ```*_```. – Alona Dec 03 '20 at 20:41
  • 1
    @Alona updated the answer, with pointers to good explanations – Dani Mesejo Dec 03 '20 at 20:50
1

The idea is that you have a list of dict As the keys are different, we first need to find out the first two keys, if any. Next, we take the keys we learned and compare their values if they match, we add to the list

dict_data = [{'composition': 12, 'process': 4, 'pathology': 4},
                     {'food': 9, 'composition': 9, 'process': 6, 'other_meds': 3},
                     {'process': 2},
                     {'process': 2, 'other_meds': 6},
                     {'composition': 6, 'other_meds': 6, 'pathology': 2, 'process': 1}]
new_list = []
for item in dict_data:
    val_keys = list(item.keys())
    if len(val_keys) >= 2 and item[val_keys[0]] == item[val_keys[1]]:
        new_list.append(item)
        print(item)
1
In [2]: import pandas as pd
   ...: test = pd.DataFrame({'one':['hello', 'there', 'every', 'body'],
   ...:        'two': ['a', 'b', 'c', 'd'],
   ...:        'dict': [{'composition': 12, 'process': 4, 'pathology': 4},
   ...:                 {'food': 9, 'composition': 9, 'process': 6, 'other_meds': 3},
   ...:                 {'process': 2},
   ...:                 {'composition': 6, 'other_meds': 6, 'pathology': 2, 'process': 1}]})
   ...: test
Out[2]: 
     one two                                               dict
0  hello   a  {'composition': 12, 'process': 4, 'pathology': 4}
1  there   b  {'food': 9, 'composition': 9, 'process': 6, 'o...
2  every   c                                     {'process': 2}
3   body   d  {'composition': 6, 'other_meds': 6, 'pathology...

In [3]: new_df = test[test.dict.apply(lambda x: list(x.values())[0] == list(x.values())[1] if len(x) > 1 else None) == True]
   ...: new_df
Out[3]: 
     one two                                               dict
1  there   b  {'food': 9, 'composition': 9, 'process': 6, 'o...
3   body   d  {'composition': 6, 'other_meds': 6, 'pathology...
Amir saleem
  • 1,404
  • 1
  • 8
  • 11