1

I have a pandas dataframe and I need to create a new column based on an if-else condition. This question already came up here multiple times (e.g., Creating a new column based on if-elif-else condition).

However, I cannot apply the proposed solution, since I also need to look up values in a list in order to check the condition. I cannot do this with the proposed solution, because I am not sure how I can access my lookup-list in the external function. My lookup-list would need to be global, which I want to avoid. I have the feeling there should be a better way to do this.

Consider the following dataframe df:

letters
A
B
C
D
E
F

I also have a list which contains lookup values:

lookup = [C,D]

Now, I want to create a new column in my dataframe which contains 1 if the respective value is contained in lookup and 0 if the values is not in lookup.

The typical approach would be:

df.apply(helper, axis=1)

def helper(row):
  if(row['letters'].isin(lookup)):
     row['result'] = 1
  else:
     row['result'] = 0

However, I do not know how I can access lookup in helper() without making it global.

The result should look like this:

letters    result
A          0
B          0
C          1
D          1
E          0
F          0
Community
  • 1
  • 1
beta
  • 5,324
  • 15
  • 57
  • 99
  • 3
    This is very similar to this: http://stackoverflow.com/questions/38499890/how-to-use-pandas-apply-function-on-all-columns-of-some-rows-of-data-frame you just need to do `df.loc[df['letters'].isin(lookup),'result'] = 1` – EdChum Jul 21 '16 at 10:28
  • 1
    Or more complete is `df['result'] = np.where(df['letters'].isin(lookup), 1, 0)` – EdChum Jul 21 '16 at 10:40

1 Answers1

2

Although this question is very similar to the question: How to use pandas apply function on all columns of some rows of data frame

I think here it's worth showing a couple methods, on a single line using np.where with a boolean mask generated from isin, isin will return a boolean Series where any rows contain any matches in your list:

In [71]:
lookup = ['C','D']
df['result'] = np.where(df['letters'].isin(lookup), 1, 0)
df

Out[71]:
  letters  result
0       A       0
1       B       0
2       C       1
3       D       1
4       E       0
5       F       0

here using 2 loc statements and using ~ to invert the mask:

In [72]:
df.loc[df['letters'].isin(lookup),'result'] = 1
df.loc[~df['letters'].isin(lookup),'result'] = 0
df

Out[72]:
  letters  result
0       A       0
1       B       0
2       C       1
3       D       1
4       E       0
5       F       0
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562