pandas conditional logic with mixed dtypes

Question

I have a pandas dataframe as follows:

foo bar
a   b
1   10
2   25
3   9

I want to add a new column as follows:

foo bar baz
a   b   0
1   10  1
2   25  1
3   9   1

Which is: if row['foo'] or row['bar] is numeric, then row['baz'] = 1 else 0

What I have so far is:

def some_function(row):
   if row['foo']>=0 or row['bar']>=0:
      return 1
   return 0

df['baz'] = df.apply(lambda row: some_function(row), axis=1

But this doesn't work because the dtype is not int. I can't drop non-int rows, because I need them in the dataframe.

Any idea how I can solve this?

jezrael · Accepted Answer · 2017-10-17T15:26:25.653

5

If want check numeric saved as strings use to_numeric, then compare with ge (>=) and use all for check if all values are True per rows:

df['baz'] = df.apply(pd.to_numeric, errors='coerce').ge(0).all(1).astype(int)
print (df)
  foo bar  baz
0   a   b    0
1   1  10    1
2   2  25    1
3   3   9    1

Or if need check columns separately:

df['baz'] = (pd.to_numeric(df['foo'], errors='coerce').ge(0) | 
            pd.to_numeric(df['bar'], errors='coerce').ge(0)).astype(int)

Thanks, Zero for solution for check numeric:

df['baz'] = df.apply(pd.to_numeric, errors='force').notnull().all(1).astype(int)

But if numeric with strings is necessary compare type:

df = pd.DataFrame({'foo': ['a', 1, 2, 3], 'bar': ['b', 10, 25, 9]}) 


df['baz'] = (df.applymap(type) == str).all(1).astype(int)
print (df)
  bar foo  baz
0   b   a    1
1  10   1    0
2  25   2    0
3   9   3    0

Detail:

print (df.applymap(type))
             bar            foo
0  <class 'str'>  <class 'str'>
1  <class 'int'>  <class 'int'>
2  <class 'int'>  <class 'int'>
3  <class 'int'>  <class 'int'>

edited Oct 17 '17 at 15:26

answered Oct 17 '17 at 15:15

jezrael

822,522
95
1,334
1,252

Need `df.apply(pd.to_numeric, errors='force').notnull().all(1).astype(int)` perhaps. – Zero Oct 17 '17 at 15:17
@jezrael Thanks! Any idea how I can check if the value in 'foo' or 'bar' is >= 5? and apply the above condition? – Kvothe Oct 17 '17 at 15:26
Change `ge(5)` - `greater or equal` – jezrael Oct 17 '17 at 15:27
You're a star! thanks! @Zero thanks for your suggestion too :) – Kvothe Oct 17 '17 at 15:29
I'm curious; does `apply` run at ~the same speed as a python `for` loop in this case (i.e. not vectorized)? I was under the impression that `apply` does, but cant test atm. – roganjosh Oct 17 '17 at 15:33
1

If want faster solution the best is compare each column separately, `apply` is `loop in pandas`, so it should be slowier as native pandas function. [This](https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues/24871316#24871316) is nice comaprision of pandas aproaches. – jezrael Oct 17 '17 at 15:35
Thats a nice link, thanks. The main thing I was thinking when back at a PC was to test if you could vectorize the check over the two columns separately for speed. I guess `apply` wins for readability here :) – roganjosh Oct 17 '17 at 15:40

pandas conditional logic with mixed dtypes

1 Answers1