1

Input df

ID      Date    TAVG  TMAX  TMIN
1   01-01-2020         26    21
2   01-01-2020   15    16    
3   01-01-2020   25    29    18
1   02-01-2020   16          16
2   02-01-2020         26    20
.....

The code I am using

for index, row in df.iterrows():

    if [(row["TMIN"].isnull()) & (row["TAVG"].notnull()) & (row["TMAX"].notnull())]:
        row["TMIN"] = (2 * row["TAVG"]) - row["TMAX"]

    if [(row["TMAX"].isnull()) & (row["TMIN"].notnull()) & (row["TAVG"].notnull())]:
        row["TMAX"] = (2 * row["TAVG"]) - row["TMIN"]

    if [(row["TAVG"].isnull()) & (row["TMIN"].notnull()) & (row["TMAX"].notnull())]:
        row["TAVG"] = (row["TMIN"] + row["TMAX"]) / 2

When I run this, I get the below error:

    if [(row["TMIN"].isnull()) & (row["TAVG"].notnull()) & (row["TMAX"].notnull())]:                                                                                                                                                                    
AttributeError: 'float' object has no attribute 'isnull'  

How to fix this? Any alternate way to achieve the same result?

RoshADM
  • 87
  • 1
  • 7
  • For second dupe is a bit changed your solution `df['TMIN'] = df['TMIN'].fillna(df['TAVG'] * 2 - df['TMAX']) df['TMAX'] = df['TMAX'].fillna(df['TAVG'] * 2 - df['TMIN']) df['TAVG'] = df['TAVG'].fillna(df[['TMAX', 'TMIN']].mean(axis=1))` – jezrael Oct 20 '21 at 10:23
  • 1
    @jezrael I think it would be better if you could provide solutions as answers, comment don't really instill that much confidence in the solution provided. It would also help other beginners too – RoshADM Oct 20 '21 at 10:29
  • @9769953 `df['TMIN'].fillna(df['TAVG'] * 2 - df['TMAX'], inplace=True)`; Will this also handle nulls in TMIN/TMAX columns? I'm a little doubtful if this would work... – RoshADM Oct 20 '21 at 10:35
  • @9769953 - You are right, only less precise dupes was removed. – jezrael Oct 20 '21 at 10:36
  • 1
    @RoshADM True, it wouldn't. But, consider there is both a null value in TMIN, *and* a null value in one of the two other columns. You'd be replacing a null value with a null value. The result would still be a null value, which would also be what you had in your original case (if it would work). – 9769953 Oct 20 '21 at 10:36
  • @9769953 Understood. It's a shame really that we have to discuss this here, an upvoted answer would have had better visibility for all. – RoshADM Oct 20 '21 at 10:39
  • @9769953 - create answer. – jezrael Oct 20 '21 at 10:42
  • 1
    @9769953 Also I think `inplace` is not good practice, check [this](https://www.dataschool.io/future-of-pandas/#inplace) and [this](https://github.com/pandas-dev/pandas/issues/16529) – jezrael Oct 20 '21 at 10:43
  • @jezrael `inplace` has its time and place, but you have to know when and where. I've offered both options. – 9769953 Oct 20 '21 at 10:52
  • @9769953 - Super! – jezrael Oct 20 '21 at 10:52

2 Answers2

2

.isnull() and .notnull() work on series/columns (or even dataframes. You're accessing an element of a row, that is, a single element (which happens to be a float). That causes the error.

For a lot of cases in Pandas, you shouldn't iterate over the rows individually: work column-wise instead, and skip the loop.

Your particular issue could be translated to be, column-wise:

sel = df['TMIN'].isnull() & df['TAVG'].notnull() & df['TMAX'].notnull()
df.loc[sel, 'TMIN'] = df.loc[sel, 'TAVG'] * 2 - df.loc[sel, 'TMAX']

and similar for the other two columns. All without any iterrows() or other loop.

However, since you are apparently trying to replace NaNs/null values with values from other columns, you can use .fillna() here:

df['TMIN'].fillna(df['TAVG'] * 2 - df['TMAX'], inplace=True)

or if you don't like inplace (because you don't want to change the original dataframe, or want to use the result directly in a chain computation):

df['tmin2'] = df['TMIN'].fillna(df['TAVG'] * 2 - df['TMAX'])

and for the other two columns:

df['tmax2'] = 2 * df['TAVG'] - df['TMIN']
df['tavg2'] = (df['TAVG'] + df['TMIN'])/2

You may ask what happens in a TMIN cell is null, and either the TAVG or TMAX value, or both, is null. In that case, you'd be replacing the null value with null, so nothing happens. Which, given your original if statement, would also be the case in your original code.

9769953
  • 10,344
  • 3
  • 26
  • 37
1

You can also do a row-level check in below fashion i.e.

import pandas as pd

pd.isna(row["TMIN"])

or

pd.isnull(row["TMIN"])

your code will look like,

for index, row in df.iterrows():
if [(pd.isnull(row["TMIN"])) & (pd.notnull(row["TAVG"])) & (pd.notnull(row["TMAX"]))]:
    row["TMIN"] = (2 * row["TAVG"]) - row["TMAX"]

if [(pd.isnull(row["TMAX"])) & (pd.notnull(row["TMIN"])) & (pd.notnull(row["TAVG"]))]:
    row["TMAX"] = (2 * row["TAVG"]) - row["TMIN"]

if [(pd.isnull(row["TAVG"])) & (pd.notnull(row["TMIN"])) & (pd.notnull(row["TMAX"]))]:
    row["TAVG"] = (row["TMIN"] + row["TMAX"]) / 2