1

I am new to python. I am trying to subset data from a pandas dataframe using values present in a list. Below is a simple example of what I am trying to do.

import pandas as pd

# Create dataframe df which contains only one column having weekdays as values
df = pd.DataFrame({'days':['monday','tuesday','wednesday','thursday','friday']})

# A list containing all seven days of a week
day_list = ['monday','tuesday','wednesday','thursday','friday','saturday','sunday']

# Create a new dataframe which should contain values present in list but missing in dataframe
df1 = df[~df.days.isin(day_list)]

# Output shows empty dataframe
Empty DataFrame
Columns: [days]
Index: []

# This gives error
df2 = df[~day_list.isin(df.days)]

# output from df2 code execution
df2 = df[~day_list.isin(df.days)]
AttributeError: 'list' object has no attribute 'isin'

In R, I can easily get this result using the below condition.

# Code from R
df1 <- day_list[! (day_list %in% df$days), ]

I want to create a new dataframe which contains only those values present in the list day_list but not present in df.days. In this case, it should return 'saturday' and 'sunday' as output. How can I get this result? I have looked at the solution provided in this thread - How to implement 'in' and 'not in' for Pandas dataframe. But it is not solving my problem. Any guidance on this to a Python 3.x newbie would really be appreciated.

Code_Sipra
  • 1,571
  • 4
  • 19
  • 38

1 Answers1

2

I believe you need numpy.setdiff1d with DataFrame constructor:

df1 = pd.DataFrame({'all_days': np.setdiff1d(day_list, df['days'])})
print(df1)
   all_days
0  saturday
1    sunday

Another solution is convert list to pandas structure like Series or DataFrame and use isin:

s = pd.Series(day_list)
s1 = s[~s.isin(df['days'])]

print(s1)
5    saturday
6      sunday
dtype: object

df2 = pd.DataFrame({'all_days': day_list})
df1 = df2[~df2['all_days'].isin(df['days'])]
print(df1)
   all_days
5  saturday
6    sunday
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252