1

I need to fill cells in a column based off if another column contains a certain string.

I need to fill column B based off what's in C. Like if C contains 'hello;', then fill the corresponding cell in B with 'greet'. Then if C contains 'bye;', fill the corresponding cells in B with 'farewell'.

df1
   A    B       C        D
0  w       hello; Jon    q
1  x       bye; Jon      r
2  y       hello; Jack   s
3  z       bye; Jack     t
df1['B'] = np.where(df1['C'].str.contains('hello;'), 'greet', '')
df1['B'] = np.where(df1['C'].str.contains('bye;'), 'farewell', '')

This works; however, the next line of code overwrites the 'greet' from the first line. So I'm not sure how to combine the conditionals so they don't overwrite each other. What I want the end result to be is

df1
   A    B          C          D
0  w  greet     hello; Jon    q
1  x  farewell  bye; Jon      r
2  y  greet     hello; Jack   s
3  z  farewell  bye; Jack     t
DeAnna
  • 404
  • 6
  • 17
  • when you run the second `np.where` you will overwrite the contents of `B` since you're explicitly telling python to do that. Instead of `np.where` look at `np.select` – Yuca Aug 07 '19 at 15:19
  • `df1['C'].str.extract(r'(hello|bye)')[0].map({'hello': 'greet', 'bye': 'farewell'})` – user3483203 Aug 07 '19 at 15:20
  • Or nest the `np.where()` functions within one another, but I agree that native pandas solutions are best – G. Anderson Aug 07 '19 at 15:21
  • 1
    Never nest `np.where`. That is exactly what `np.select` is for. The former is unreadable – user3483203 Aug 07 '19 at 15:22

2 Answers2

2

If you’re only going to be dealing with a binary choice and all the values exist in the column, as per the example, then this should be fine:

 df1['B'] = np.where(df1['C'].str.contains('bye;'), 'farewell', 'greet')

From the numpy docs:

numpy.where(condition[, x, y])

Return elements chosen from x or y depending on condition.

If the condition is satisfied, it will return x, else it will fill with y.

However, np.select docs will be the one you want if you have more than one condition:

conditions = [
    df['C'].str.contains('hello;'),
    df['C'].str.contains('bye;')
]

np.select(conditions, ['greet', 'farewell'])

array(['greet', 'farewell', 'greet', 'farewell'], dtype='<U11')
user3483203
  • 50,081
  • 9
  • 65
  • 94
dijksterhuis
  • 1,225
  • 11
  • 25
  • NMDV, but if you are using `np.select`, you might as well put working code that solves the question, not just copy something from the documentation. – user3483203 Aug 07 '19 at 15:34
  • @user3483203 only on my phone so it’s hard to test/type (no straight quotes on ze iPhone). Also, what does NMDV mean? – dijksterhuis Aug 07 '19 at 15:36
  • 1
    Not my downvote. I'll update your answer with a working version. – user3483203 Aug 07 '19 at 15:37
  • Ta. I was downvoted before I even included the np.select edit anyway. Slightly confused, as np.where works perfectly fine in a binary choice case. – dijksterhuis Aug 07 '19 at 15:40
1

try using np.select

m1= df['C'].str.contains('hello;')
m2= df['C'].str.contains('bye;')

df['B'] = np.select(condlist=[m1 , m2], 
                         choicelist=['greet','farewell'])
tawab_shakeel
  • 3,701
  • 10
  • 26