189

I am working with this Pandas DataFrame in Python.

File    heat    Farheit Temp_Rating
   1    YesQ         75         N/A
   1    NoR         115         N/A
   1    YesA         63         N/A
   1    NoT          83          41
   1    NoY         100          80
   1    YesZ         56          12
   2    YesQ        111         N/A
   2    NoR          60         N/A
   2    YesA         19         N/A
   2    NoT         106          77
   2    NoY          45          21
   2    YesZ         40          54
   3    YesQ         84         N/A
   3    NoR          67         N/A
   3    YesA         94         N/A
   3    NoT          68          39
   3    NoY          63          46
   3    YesZ         34          81

I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column.

This is what I need:

File        heat    Temp_Rating
   1        YesQ             75
   1         NoR            115
   1        YesA             63
   1        YesQ             41
   1         NoR             80
   1        YesA             12
   2        YesQ            111
   2         NoR             60
   2        YesA             19
   2         NoT             77
   2         NoY             21
   2        YesZ             54
   3        YesQ             84
   3         NoR             67
   3        YesA             94
   3         NoT             39
   3         NoY             46
   3        YesZ             81

If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I am not able to do this while preserving the correct order.

How can I only find Temp_Rating rows with the NaNs and replace them with the value in the same row of the Farheit column?

edesz
  • 11,756
  • 22
  • 75
  • 123

7 Answers7

263

Assuming your DataFrame is in df:

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here's the resulting DataFrame:

resulting DataFrame

Jonathan Eunice
  • 21,653
  • 6
  • 75
  • 77
  • how to work with this if both columns datatype are object and instead of N/A, it is empty cell in that row? – ashish Feb 05 '20 at 06:04
  • One possible approach to consider: You could first replace the empty string by `NaN` (see [here](https://stackoverflow.com/a/13445630/4057186)) and then use [this](https://stackoverflow.com/a/29177664/4057186) approach. – edesz Feb 05 '20 at 17:04
  • 1
    The answer is perfect. Just if you like to stay more in pandas syntax I'd suggest to delete columns by `df.drop("Farheit", axis=1)` , but thats probably personal preference – MichaelA Mar 03 '20 at 11:03
  • 2
    @MichaelA Agree `drop` now preferred to `del` in Pandas-land. If using a recent Pandas, would recommend `df = df.drop(columns='Farheit')` over numerical axis numbering. – Jonathan Eunice Mar 03 '20 at 16:02
  • How can we use this method if the column names are string labels with spaces? In that case is it better to use the pop method in the answer below? The pop method worked for me but just wondering about other syntax. Thanks. – Edison May 25 '22 at 11:59
  • No, this still works. Any column can be addressed as `df['my column with spaces']` and the setting of all column names can be done with a list, e.g.: `df.columns = ['my file name', 'heat value', 'the temperature in degrees F']`. I prefer spaceless column names in order to use the terse `df.column_name` syntax, but that's a preference not a strict requirement. Spaces can work. – Jonathan Eunice May 25 '22 at 13:02
62

The above mentioned solutions did not work for me. The method I used was:

df.loc[df['foo'].isnull(),'foo'] = df['bar']
zsad512
  • 1,097
  • 1
  • 11
  • 14
18

@Jonathan's answer is good, but an overkill, just use pop:

df['Temp_Rating'] = df['Temp_Rating'].fillna(df.pop('Farheit'))
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
8

An other way to solve this problem,

import pandas as pd
import numpy as np

ts_df = pd.DataFrame([[1,"YesQ",75,],[1,"NoR",115,],[1,"NoT",63,13],[2,"YesT",43,71]],columns=['File','heat','Farheit','Temp'])


def fx(x):
    if np.isnan(x['Temp']):
        return x['Farheit']
    else:
        return x['Temp']
print(1,ts_df)
ts_df['Temp']=ts_df.apply(lambda x : fx(x),axis=1)

print(2,ts_df)

returns:

(1,    File  heat  Farheit  Temp                                                                                    
0     1  YesQ       75   NaN                                                                                        
1     1   NoR      115   NaN                                                                                        
2     1   NoT       63  13.0                                                                                        
3     2  YesT       43  71.0)                                                                                       
(2,    File  heat  Farheit   Temp                                                                                   
0     1  YesQ       75   75.0                                                                                       
1     1   NoR      115  115.0
2     1   NoT       63   13.0
3     2  YesT       43   71.0)
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58
Felix Feng
  • 281
  • 3
  • 7
6

You can also use mask which replaces the values where Temp_Rating is NaN by the column Farheit:

df['Temp_Rating'] = df['Temp_Rating'].mask(df['Temp_Rating'].isna(), df['Farheit'])
rachwa
  • 1,805
  • 1
  • 14
  • 17
5

The accepted answer uses fillna() which will fill in missing values where the two dataframes share indices. As explained nicely here, you can use combine_first to fill in missing values, rows and index values for situations where the indices of the two dataframes don't match.

df.Col1 = df.Col1.fillna(df.Col2) #fill in missing values if indices match

#or 
df.Col1 = df.Col1.combine_first(df.Col2) #fill in values, rows, and indices
John
  • 1,018
  • 12
  • 19
  • Nice answer. In the question here, I didn't expect to have non-overlapping indices in the data so `.filna()` was sufficient. Actually, the focus here is on a single column (`Temp_Rating`), where the `NaN`s occur in the data, so the two approaches - `.fillna()` and `combine_first()` - end up producing the equivalent output. Nonetheless, this is a really useful approach. – edesz Feb 05 '21 at 02:02
0

Coming very late, but I came across a similar problem and this is how I solved it, seemed a little bit more concise for me... hope it works for everyone in a similar situation

def function_a (row):
if row['Temp_Rating'] is None : 
    val = print(row['Farheit'])
    return val
 df['Temp_Rating'] = df.apply(function_a, axis=1)
 df1= df.drop([Farheit], axis=1)
Zoe
  • 27,060
  • 21
  • 118
  • 148