2

I have need to apply a function to each column in a Pandas dataframe that includes a count of NaN in each column. Say that I have this dataframe:

import pandas as pd

df = pd.DataFrame({'Baseball': [3, 1, 2], 'Soccer': [1, 6, 7], 'Rugby': [8, 7, None]})

   Baseball  Soccer  Rugby
0         3       1    8.0
1         1       6    7.0
2         2       7    NaN

I can get the count of NaN in each column with:

df.isnull().sum()

Baseball    0
Soccer      0
Rugby       1

But I can't figure out how to use that result in a function to apply to each column. Say just as an example, I want to add the number of NaN in a column to each element in that column to get:

   Baseball  Soccer  Rugby
0         3       1    9.0
1         1       6    8.0
2         2       7    NaN

(My actual function is more complex.) I tried:

def f(x, y):
    return x + y

df2 = df.apply(lambda x: f(x, df.isnull().sum()))

and I get the thoroughly mangled:

          Baseball  Soccer  Rugby
0              NaN     NaN    NaN
1              NaN     NaN    NaN
2              NaN     NaN    NaN
Baseball       NaN     NaN    NaN
Rugby          NaN     NaN    NaN
Soccer         NaN     NaN    NaN

Any idea how to use the count of NaN in each column in a function applied to each column?

Thanks in advance!

Dribbler
  • 4,343
  • 10
  • 33
  • 53
  • 2
    `df.add(df.isnull().sum())` – ALollz Oct 10 '19 at 21:56
  • Thanks ALollz, but I used add just as an example. I have a much more complex function that I need to use I edited the question to make it more clear that addition is just an example, so I appreciate the comment! – Dribbler Oct 10 '19 at 22:04
  • Well, then perhaps try to create an example that's a little closer to your underlying problem? Your issue is just a mis-alignment of the addition axis. DataFrame.add(Series) by default aligns the Series Index with the DataFrame columns. Whatever you were doing was aligning along the Index. https://stackoverflow.com/questions/53217607/how-do-i-operate-on-a-dataframe-with-a-series-for-every-column should have much more information, and perhaps will help you with your more complex function. – ALollz Oct 10 '19 at 22:12
  • ALollz I did update the question to stress that addition was just an example and expressed my appreciation that your comment prompted me to do that Now that you know that it's not just addition--assume that a function must be generated--can you help? – Dribbler Oct 10 '19 at 22:16
  • 1
    @Dribbler are you able to show some data closer to your problem? i think if you pass `axis=0` into your function it will work row-wise and `axis=1` for column wise – Umar.H Oct 10 '19 at 22:16
  • 1
    `df.apply(lambda x : x + df.isnull().sum(), axis = 1)` – vb_rises Oct 10 '19 at 22:21
  • 1
    Datanovice and vb_rises: that did it! So many thanks I had thought axis=1 was a default, so I didn't think to specify it, but I guess that's not the case for all dataframe methods. – Dribbler Oct 10 '19 at 22:24
  • 1
    awesome, as @ALollz pointed it out first he should provide the answer so it can be closed. Best of luck with your work Professor – Umar.H Oct 10 '19 at 22:28

2 Answers2

0

Thanks to Datanovice and vb_rises, the answer is:

df.apply(lambda x : x + df.isnull().sum(), axis=1)

If anyone had a similar question, I wanted the answer to be clear and without the need to read through the comments. I had thought that axis=1 (column-wise) is a default in Pandas, but it seems that's not necessarily the case for all methods.

Dribbler
  • 4,343
  • 10
  • 33
  • 53
0

I prefer @ALollz' answer; df.add(df.isnull().sum()).

The lambda function @Dribbler is defining already exists in the form of .add().

gosuto
  • 5,422
  • 6
  • 36
  • 57
  • That's why I clarified in the question that my actual function is more complex. I agree that if it were a simple sum, .add would be more sensible. But I created very simple code to illustrate my problem. The Datanovice and vb_rises solution is extensible to all functions, which is what I was after. – Dribbler Oct 11 '19 at 21:32