2

How can I replace all the non-NaN values in a pandas dataframe with 1 but leave the NaN values alone? This almost does what I'm looking for. The problem is it also makes NaN values 0. Then I have to reset them to NaN after.

I would like this

    a    b
0  NaN  QQQ
1  AAA  NaN
2  NaN  BBB

to become this

    a    b
0  NaN   1
1   1   NaN
2  NaN   1

This code is almost what I want

newdf = df.notnull().astype('int')

The above code does this

    a    b
0   0   1
1   1   0
2   0   1
wraith
  • 370
  • 4
  • 16

2 Answers2

6

One way would be to select all non-null values from the original data frame and set them to one:

df[df.notnull()] = 1

This solution on your data:

df = pd.DataFrame({'a': [np.nan, 'AAA', np.nan], 'b': ['QQQ', np.nan, 'BBB']})
df[df.notnull()] = 1

df 
    a   b
0   NaN 1
1   1   NaN
2   NaN 1
johnchase
  • 13,155
  • 6
  • 38
  • 64
  • 1
    Much simpler solution than mine, but interestingly I made my own dataframe before OP updated their question and discovered this method doesn't work if you have mixed datatypes in the dataframe – G. Anderson Feb 26 '20 at 22:09
  • 1
    That is interesting. It seems to be due to the dictionary present in the cell. I would not have caught that – johnchase Feb 26 '20 at 22:14
  • 1
    Me either haha! I still prefer your method for readability, but an interesting caveat to be sure – G. Anderson Feb 26 '20 at 22:16
  • I can confirm this works for my data as well. Haven't tried it on mixed datatypes so I can't speak to that. – wraith Feb 26 '20 at 22:18
  • 2
    And, just for fun, another counterintuitive (to me) note: the boolean masking solution runs in `1.2 ms`, the `np.where` solution runs in `512 µs`. Go figure! – G. Anderson Feb 26 '20 at 22:24
  • @G.Anderson I have noticed in the past that `np.where` is very fast. However, I don't know the inner workings well enough to explain why. – johnchase Feb 28 '20 at 15:36
3

You can use np.where() with DataFrame.isna() to accomplish this

df=pd.DataFrame(data=[[1,np.NaN,5],
                      ['q',np.NaN,np.NaN],
                      ['7',{'a':1},np.NaN]],
                columns=['a','b','c'])

    a   b           c
0   1   NaN         5.0
1   q   NaN         NaN
2   7   {'a': 1}    NaN

df1=pd.DataFrame(np.where(df.isna(),df,1), columns=df.columns)

    a   b       c
0   1   NaN     1
1   1   NaN     NaN
2   1   1       NaN
G. Anderson
  • 5,815
  • 2
  • 14
  • 21