I have a data frame like this:
>>> MCQ_DATA['q45']
10 [13, 14]
11 [13, 14]
12 [13, 12]
13 [13, 12]
14 [9, 12]
15 [2, 6]
16 [2]
17 [16]
18 [13, 12]
19 [13, 11]
So all values for column 'q45' are lists. I want to create a boolean filter for rows that contain 13, like:
>>> [MCQ_DATA['q45'] == 13]
10 True
11 True
12 True
13 True
14 False
15 False
16 False
17 False
18 True
19 True
Already tried these:
[MCQ_DATA['q45'] == 13]
returns false for everything.[MCQ_DATA['q45'].isin([13])]
returnsTypeError
. Also, this looks for values against a list. I want to check a nested list in a dataframe for a value. (from Use a list of values to select rows from a pandas dataframe)df.loc[df['q45'] == 13]
returns an empty dataframe, because none of the values are13
. (From Select rows from a DataFrame based on values in a column in pandas)df['q45'].apply(lambda sublist: 13 in sublist)
finally, this worked. But the source says this is not an efficient way to do it: Operating on tuples held within Pandas DataFrame column)
I've after looking on SO for the prerequisite half-hour, if the last way is not the right way, what IS the right way?
Further testing found this also works. I would say it is the best approach. Uses pandas framework and easily readable:
df['q45'].dropna().apply({13}.issubset)
-- I'm guessing this is faster if I do that on a large scale, but maybe someone knows. (I needed the.dropna()
becausenan
gives an error.)