Subset missing values from list into new dataframe in Python 3.x

Question

I am new to python. I am trying to subset data from a pandas dataframe using values present in a list. Below is a simple example of what I am trying to do.

import pandas as pd

# Create dataframe df which contains only one column having weekdays as values
df = pd.DataFrame({'days':['monday','tuesday','wednesday','thursday','friday']})

# A list containing all seven days of a week
day_list = ['monday','tuesday','wednesday','thursday','friday','saturday','sunday']

# Create a new dataframe which should contain values present in list but missing in dataframe
df1 = df[~df.days.isin(day_list)]

# Output shows empty dataframe
Empty DataFrame
Columns: [days]
Index: []

# This gives error
df2 = df[~day_list.isin(df.days)]

# output from df2 code execution
df2 = df[~day_list.isin(df.days)]
AttributeError: 'list' object has no attribute 'isin'

In R, I can easily get this result using the below condition.

# Code from R
df1 <- day_list[! (day_list %in% df$days), ]

I want to create a new dataframe which contains only those values present in the list day_list but not present in df.days. In this case, it should return 'saturday' and 'sunday' as output. How can I get this result? I have looked at the solution provided in this thread - How to implement 'in' and 'not in' for Pandas dataframe. But it is not solving my problem. Any guidance on this to a Python 3.x newbie would really be appreciated.

jezrael · Accepted Answer · 2018-01-07T11:33:52.357

2

I believe you need numpy.setdiff1d with DataFrame constructor:

df1 = pd.DataFrame({'all_days': np.setdiff1d(day_list, df['days'])})
print(df1)
   all_days
0  saturday
1    sunday

Another solution is convert list to pandas structure like Series or DataFrame and use isin:

s = pd.Series(day_list)
s1 = s[~s.isin(df['days'])]

print(s1)
5    saturday
6      sunday
dtype: object

df2 = pd.DataFrame({'all_days': day_list})
df1 = df2[~df2['all_days'].isin(df['days'])]
print(df1)
   all_days
5  saturday
6    sunday

edited Jan 07 '18 at 11:33

answered Jan 07 '18 at 11:31

jezrael

822,522
95
1,334
1,252

So you are suggesting I convert the list into a dataframe fist? – Code_Sipra Jan 07 '18 at 11:33
You need difference betwen list and column in `df`, so for pure pandas solution need it. Or use numpy solution. – jezrael Jan 07 '18 at 11:34
+1 for your simple solution, @jezrael. I tried the third option by converting the list to a dataframe and it worked perfectly. – Code_Sipra Jan 07 '18 at 11:45
@Code_Sipra - no problem ;) – jezrael Jan 07 '18 at 12:18

Subset missing values from list into new dataframe in Python 3.x

1 Answers1