5

Is it possible to maintain multiple columns at once using np.where? usually, one column is maintained with np.where, so my coding would look like this:

df['col1] = np.where(df[df.condition == 'yes'],'sth', '')
df['col2'] = np.where(df[df.condition == 'yes'], 50.00, 0.0)

But caused by the fact that I test for the same condition twice, I was wondering, if I can pass 2 columns and fill them both in one run.

I tried this:

df['col1','col2'] = np.where(df[df.condition == 'yes'],['sth',50.00], ['',0.0])

But it does not work. Is there a way to realize this?

Thanks :)

MaMo
  • 569
  • 1
  • 10
  • 27
  • What do you mean by "maintain multiple columns at once"? Can you show a sample of the data you have and the result you are trying to achieve? – jdehesa Mar 23 '18 at 10:15
  • Possible duplicate of [Numpy where function multiple conditions](https://stackoverflow.com/questions/16343752/numpy-where-function-multiple-conditions) – Rafael Mar 23 '18 at 10:15
  • @jezrael, I'm struggling to understand this question. If you understand it, would mind editing the question? I can see this confusing many users, not just me! – jpp Mar 24 '18 at 21:05

1 Answers1

2

I think need reshape boolean mask to (N x 1):

m = df.condition == 'yes'
df[['col1','col2']] = pd.DataFrame(np.where(m[:, None], ['sth',50.00], ['',0.0]))

Only disadvatage of solution is if different types of values in lists - numeric with strings - then numpy.where both output columns convert to strings.

Sample:

df = pd.DataFrame({'A':list('abcdef'),
                     'condition':['yes'] * 3 + ['no'] * 3})

print (df)
   A condition
0  a       yes
1  b       yes
2  c       yes
3  d        no
4  e        no
5  f        no

m = df.condition == 'yes'
df[['col1','col2']] = pd.DataFrame(np.where(m[:, None], ['sth',50.00], ['',0.0]))
print (df)
   A condition col1  col2
0  a       yes  sth  50.0
1  b       yes  sth  50.0
2  c       yes  sth  50.0
3  d        no        0.0
4  e        no        0.0
5  f        no        0.0

print (df.applymap(type))
               A      condition           col1           col2
0  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
1  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
2  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
3  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
4  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
5  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>

EDIT: I test it with NaNs values:

df = pd.DataFrame({'A':list('abcdefghi'),
                     'condition':['yes'] * 3 + ['no'] * 3 + [np.nan] * 3})

m = df.condition == 'yes'
df[['col1','col2']] = pd.DataFrame(np.where(m[:, None], ['sth',50.00], ['',0.0]))
print (df)
   A condition col1  col2
0  a       yes  sth  50.0
1  b       yes  sth  50.0
2  c       yes  sth  50.0
3  d        no        0.0
4  e        no        0.0
5  f        no        0.0
6  g       NaN        0.0
7  h       NaN        0.0
8  i       NaN        0.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thank you, this almost works. Any idea, why it does not work for the last 6 rows? col1 and col2 both contain nan within these last rows, all other rows have valid content. – MaMo Mar 25 '18 at 12:54
  • @MaMo - hmmm, it seems there is problem need another condition for exclude `NaNs`. – jezrael Mar 25 '18 at 13:01
  • @MaMo - I add sample to answer with `NaN`s and it working nice. Can you test it? Or I dont understand what you need? – jezrael Mar 25 '18 at 13:16
  • I get `NaN`s in col1 and col2 for last few rows, not in the condition column. I so not know the reason because there is nothing special about those rows or the condition column. – MaMo Mar 26 '18 at 11:22
  • @MaMo - Without data hard to know, are data confidental? – jezrael Mar 26 '18 at 11:25
  • yes, they are, but problem solved. It has been my mistake, the df I used has been a subset of another df so the indexing has been the problem. Added `df.reset_index()` before executing the `np.where`-Coding and voilà :) Thanks again! – MaMo Mar 26 '18 at 13:08