select column in pandas dataframe whose nested list value matches a given value

Question

I have a data frame like this:

>>> MCQ_DATA['q45']
10    [13, 14]
11    [13, 14]
12    [13, 12]
13    [13, 12]
14     [9, 12]
15      [2, 6]
16         [2]
17        [16]
18    [13, 12]
19    [13, 11]

So all values for column 'q45' are lists. I want to create a boolean filter for rows that contain 13, like:

>>> [MCQ_DATA['q45'] == 13]
10    True
11    True
12    True
13    True
14    False
15    False
16    False
17    False
18    True
19    True

Already tried these:

[MCQ_DATA['q45'] == 13] returns false for everything.
[MCQ_DATA['q45'].isin([13])] returns TypeError. Also, this looks for values against a list. I want to check a nested list in a dataframe for a value. (from Use a list of values to select rows from a pandas dataframe)
df.loc[df['q45'] == 13] returns an empty dataframe, because none of the values are 13. (From Select rows from a DataFrame based on values in a column in pandas)
df['q45'].apply(lambda sublist: 13 in sublist)finally, this worked. But the source says this is not an efficient way to do it: Operating on tuples held within Pandas DataFrame column)

I've after looking on SO for the prerequisite half-hour, if the last way is not the right way, what IS the right way?

Further testing found this also works. I would say it is the best approach. Uses pandas framework and easily readable:

df['q45'].dropna().apply({13}.issubset) -- I'm guessing this is faster if I do that on a large scale, but maybe someone knows. (I needed the .dropna() because nan gives an error.)

I've updated the dupe target answer with you in mind. Hope that helps. — piRSquared, Oct 11 '18 at 18:11
thanks @piRSquared -- but my question was of the many ways you outline, which is the best way, if you have to run this at scale? — Marc Maxmeister, Oct 11 '18 at 18:14
This is an art and depends on your situation. I suspect my Numpy approach would be quicker. If you'd like, I can reopen this question to encourage someone to answer with more explanation. But I do believe the information you need is in the answers from the dupe target. — piRSquared, Oct 11 '18 at 18:16
Okay @piRSquared I have another problem. Not exactly this - but I'm getting TypeError (float) issues when applying the `.issubset` method and these are being caused by `nan` in the data (np.nan must be type(float)) -- how can I have it ignore the `nan` parts while applying the filter? — Marc Maxmeister, Oct 11 '18 at 18:33
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181695/discussion-between-marc-maxson-and-pirsquared). — Marc Maxmeister, Oct 11 '18 at 18:40

select column in pandas dataframe whose nested list value matches a given value

0 Answers0