0

I have to find rows matching we = 2

import pandas as pd

# Create mock dataframe
df = pd.DataFrame([
    [20, 30, {'ab':1, 'we':2, 'as':3}, 'String1'],
    [21, 31, {'ab':4, 'we':5, 'as':6}, 'String2'],
    [22, 32, {'ab':7, 'we':2, 'as':9}, 'String2'],
], columns=['Col A', 'Col B', 'Col C', 'Col D'])

How can I do this in this case because the Col C contains a dict

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Santhosh
  • 9,965
  • 20
  • 103
  • 243

1 Answers1

3

We can use the str accesssor to access the value in the dict then use normal comparison to test for the value. In this example, select we and eq(2). The mask (m) can be used to filter the DataFrame to find matches:

m = df['Col C'].str['we'].eq(2)
filtered_df = df[m]

If going to assign to filtered_df use copy to avoid a later SettingWithCopyWarning:

filtered_df = df[m].copy()

Naturally this can be done in one line without a separate variable:

filtered_df = df[df['Col C'].str['we'].eq(2)].copy()

filtered_df:

   Col A  Col B                        Col C    Col D
0     20     30  {'ab': 1, 'we': 2, 'as': 3}  String1
2     22     32  {'ab': 7, 'we': 2, 'as': 9}  String2
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • What's the difference of quoting `df.columns` as the second parameter of `df.loc` as opposed to `:` or don't quote second parameter altogether? i.e. why don't use `df.loc[m, :]` or just `df.loc[m]` instead ? – SeaBean Jul 24 '21 at 18:18
  • Hi Henry, I tested 4 varieties with `df[m]`, `df.loc[m]`, `df.loc[m, :]` and `df.loc[m, df.columns]` on also Pandas 1.3.0. Only `df[m]` gets the `SettingWithCopyWarning`. All the 3 varieties with `df.loc` didn't get any warning/error message. Can you please point me to any documentation / article mentioning about this? I've also referenced the [classic question in SO that many of us refer to](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) but still got no mentioning about this. You can also consider sharing this point to that question. – SeaBean Jul 24 '21 at 21:03
  • 1
    Thanks Henry. I'll also try the `_is_copy` test probably tomorrow. Will let you know the finding afterwards. Thanks! :-) – SeaBean Jul 24 '21 at 21:30
  • @SeaBean apparently explicitly selecting columns is an edge case which does not correctly set `_is_copy`. I've corrected my answer to use `copy` which is the correct way to handle this case. [Issue](https://github.com/pandas-dev/pandas/issues/42703) – Henry Ecker Jul 25 '21 at 00:38
  • Hi Henry, I tested with `_is_copy` and `_is_view` but couldn't arrive at a concrete conclusion like you. Yeah, using `copy`, though could be at a cost, is a sure shot that can avoid the warning. Upvoted. – SeaBean Jul 25 '21 at 14:31