2

I have a pandas dataframe as follows:

foo bar
a   b
1   10
2   25
3   9

I want to add a new column as follows:

foo bar baz
a   b   0
1   10  1
2   25  1
3   9   1

Which is: if row['foo'] or row['bar] is numeric, then row['baz'] = 1 else 0

What I have so far is:

def some_function(row):
   if row['foo']>=0 or row['bar']>=0:
      return 1
   return 0

df['baz'] = df.apply(lambda row: some_function(row), axis=1

But this doesn't work because the dtype is not int. I can't drop non-int rows, because I need them in the dataframe.

Any idea how I can solve this?

Vaibhav Mule
  • 5,016
  • 4
  • 35
  • 52
Kvothe
  • 1,341
  • 7
  • 20
  • 33

1 Answers1

5

If want check numeric saved as strings use to_numeric, then compare with ge (>=) and use all for check if all values are True per rows:

df['baz'] = df.apply(pd.to_numeric, errors='coerce').ge(0).all(1).astype(int)
print (df)
  foo bar  baz
0   a   b    0
1   1  10    1
2   2  25    1
3   3   9    1

Or if need check columns separately:

df['baz'] = (pd.to_numeric(df['foo'], errors='coerce').ge(0) | 
            pd.to_numeric(df['bar'], errors='coerce').ge(0)).astype(int)

Thanks, Zero for solution for check numeric:

df['baz'] = df.apply(pd.to_numeric, errors='force').notnull().all(1).astype(int)

But if numeric with strings is necessary compare type:

df = pd.DataFrame({'foo': ['a', 1, 2, 3], 'bar': ['b', 10, 25, 9]}) 


df['baz'] = (df.applymap(type) == str).all(1).astype(int)
print (df)
  bar foo  baz
0   b   a    1
1  10   1    0
2  25   2    0
3   9   3    0

Detail:

print (df.applymap(type))
             bar            foo
0  <class 'str'>  <class 'str'>
1  <class 'int'>  <class 'int'>
2  <class 'int'>  <class 'int'>
3  <class 'int'>  <class 'int'>
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Need `df.apply(pd.to_numeric, errors='force').notnull().all(1).astype(int)` perhaps. – Zero Oct 17 '17 at 15:17
  • @jezrael Thanks! Any idea how I can check if the value in 'foo' or 'bar' is >= 5? and apply the above condition? – Kvothe Oct 17 '17 at 15:26
  • Change `ge(5)` - `greater or equal` – jezrael Oct 17 '17 at 15:27
  • You're a star! thanks! @Zero thanks for your suggestion too :) – Kvothe Oct 17 '17 at 15:29
  • I'm curious; does `apply` run at ~the same speed as a python `for` loop in this case (i.e. not vectorized)? I was under the impression that `apply` does, but cant test atm. – roganjosh Oct 17 '17 at 15:33
  • 1
    If want faster solution the best is compare each column separately, `apply` is `loop in pandas`, so it should be slowier as native pandas function. [This](https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues/24871316#24871316) is nice comaprision of pandas aproaches. – jezrael Oct 17 '17 at 15:35
  • Thats a nice link, thanks. The main thing I was thinking when back at a PC was to test if you could vectorize the check over the two columns separately for speed. I guess `apply` wins for readability here :) – roganjosh Oct 17 '17 at 15:40