Pandas Dataframe Check if column value is in column list

Question

I have a dataframe df:

data = {'id':[12,112],
        'idlist':[[1,5,7,12,112],[5,7,12,111,113]]
       }
df=pd.DataFrame.from_dict(data)

which looks like this:

    id                idlist
0   12    [1, 5, 7, 12, 112]
1  112  [5, 7, 12, 111, 113]

I need to check and see if id is in the idlist, and select or flag it. I have tried variations of the following and receive the commented error:

df=df.loc[df.id.isin(df.idlist),:] #TypeError: unhashable type: 'list'
df['flag']=df.where(df.idlist.isin(df.idlist),1,0) #TypeError: unhashable type: 'list'

Some possible other methods to a solution would be .apply in a list comprehension?

I am looking for a solution here that either selects the rows where id is in idlist, or flags the row with a 1 where id is in idlist. The resulting df should be either:

   id              idlist
0  12  [1, 5, 7, 12, 112]

or:

   flag   id                idlist
0     1   12    [1, 5, 7, 12, 112]
1     0  112  [5, 7, 12, 111, 113]

Thanks for the help!

jezrael · Accepted Answer · 2017-11-27T15:03:03.047

18

Use apply:

df['flag'] = df.apply(lambda x: int(x['id'] in x['idlist']), axis=1)
print (df)
    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

Similar:

df['flag'] = df.apply(lambda x: x['id'] in x['idlist'], axis=1).astype(int)
print (df)
    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

With list comprehension:

df['flag'] = [int(x[0] in x[1]) for x in df[['id', 'idlist']].values.tolist()]
print (df)
    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

Solutions for filtering:

df = df[df.apply(lambda x: x['id'] in x['idlist'], axis=1)]
print (df)
   id              idlist
0  12  [1, 5, 7, 12, 112]

df = df[[x[0] in x[1] for x in df[['id', 'idlist']].values.tolist()]]
print (df)

   id              idlist
0  12  [1, 5, 7, 12, 112]

edited Nov 27 '17 at 15:03

answered Nov 27 '17 at 14:50

jezrael

822,522
95
1,334
1,252

timeits on my df of 1.6m rows: – clg4 Nov 27 '17 at 16:11
I think list comprehension solution should be faster. – jezrael Nov 27 '17 at 16:17
1

here they are:timeits on my df of 1.6m rows: df['flag'] = df.loc[:, ('id', 'idlist')].apply(lambda x: 1 if x[0] in x[1] else 0, axis=1) 2min10s | df['flag'] = df.apply(lambda x: int(x['id'] in x['idlist']), axis=1) 1min55s | df['flag'] = df.apply(lambda x: x['id'] in x['idlist'], axis=1).astype(int) 1min54s | df['flag'] = [int(x[0] in x[1]) for x in df[['id', 'idlist']].values.tolist()] 1min24s | df.apply(lambda x : set([x.id]).issubset(x.idlist),1).astype(int). The various loops below and deleted were much slower. Winner is the list comprehension version from @jezreal! – clg4 Nov 27 '17 at 16:19
2

winner is the list comprehension based on my timeits – clg4 Nov 27 '17 at 16:26

score 4 · Answer 2 · answered Nov 27 '17 at 14:43

4

You can use df.apply and process each row and create a new column flag that will check the condition and give you result as second output requested.

df['flag'] = df.loc[:, ('id', 'idlist')].apply(lambda x: 1 if x[0] in x[1] else 0, axis=1)

print(df)

where x[0] is id and x[1] is idlist

answered Nov 27 '17 at 14:43

Aafaque Abdullah

361
3
13

2

you can just use column names for indexing, so: `df['flag'] = df.apply(lambda x: x.id in x.idlist, axis=1)` – Swier Nov 27 '17 at 14:51
@Swier - this is in my answer ;) – jezrael Nov 27 '17 at 14:52
@jezrael dammit, you beat me to it :P – Swier Nov 27 '17 at 14:56

rnso · Answer 3 · 2017-11-28T03:54:03.683

Try simple for loop:

flaglist = []
for i in range(len(df)):
    if df.id[i] in df.idlist[i]:
        flaglist.append(1)
    else:
        flaglist.append(0)
df["flag"] = flaglist

df:

    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

To drop rows:

flaglist = []
for i in range(len(df)):
    if df.id[i] not in df.idlist[i]:
        flaglist.append(i)
df = df.drop(flaglist)

df:

   id              idlist  flag
0  12  [1, 5, 7, 12, 112]     1

Above can be converted to list comprehension for creating a flag column:

df["flag"] = [df.id[i] in df.idlist[i]    for i in range(len(df))]
print(df)
#     id                idlist   flag
# 0   12    [1, 5, 7, 12, 112]   True
# 1  112  [5, 7, 12, 111, 113]  False

or

df["flag"] = [1 if df.id[i] in df.idlist[i] else 0    for i in range(len(df))]
print(df)
#     id                idlist  flag
# 0   12    [1, 5, 7, 12, 112]     1
# 1  112  [5, 7, 12, 111, 113]     0

and for selecting out rows:

flaglist = [i   for i in range(len(df))   if df.id[i] in df.idlist[i]]
print(df.iloc[flaglist])
#    id              idlist
# 0  12  [1, 5, 7, 12, 112]

BENY · Answer 4 · 2017-11-27T16:30:53.213

1

By using issubset

df.apply(lambda  x : set([x.id]).issubset(x.idlist),1).astype(int)
Out[378]: 
0    1
1    0
dtype: int32

By using np.vectorize

def myfun(x,y):
    return np.in1d(x,y)


np.vectorize(myfun)(df.id,df.idlist).astype(int)

Timing :

%timeit np.vectorize(myfun)(df.id,df.idlist).astype(int)
10000 loops, best of 3: 92.3 µs per loop
%timeit df.apply(lambda  x : set([x.id]).issubset(x.idlist),1).astype(int)
1000 loops, best of 3: 353 µs per loop

edited Nov 27 '17 at 16:30

answered Nov 27 '17 at 15:07

BENY

317,841
20
164
234

@clg4 one sec , let me provide another one – BENY Nov 27 '17 at 16:25
tried the vectorize option, and it was actually much slower than the others on my data set... – clg4 Dec 01 '17 at 19:10

Pandas Dataframe Check if column value is in column list

4 Answers4

Linked