2

I am trying to fill the missing values in the data frame, but all of the values were replaced with None.

Here is the example I have tried:

# Basic libraries
import os
import pandas as pd
import numpy as np

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
import folium
#import folium.plugins as plugins
from wordcloud import WordCloud
import plotly.express as px

data_dict = {'First':[100, 90, np.nan, 95], 
        'Second': [30, 45, 56, np.nan], 
        'Third':[np.nan, 40, 80, 98]} 
  
#reating a dataframe from list 
df1 = pd.DataFrame(data_dict)

#first_try_with_column_name
df1.loc[:,'First'] = df1.loc[:,'First'].fillna(method='ffill', inplace=True)

#Second_try_Using_List_of_Columns
list_columns = ['First','Second','Third']
df1.loc[:,list_columns] = df1.loc[:,list_columns].fillna(value, inplace=True)
df1

As shown, I used multiple ways to understand the reason behind this issue, so I tried to use the column name, and then I used a list of column names, but unfortunately, the issue is the same.

Is there any recommendation, please?

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
  • 1
    You are already specifying inplace, you don't need to assign the value again. That might be problem. df1.loc[:,list_columns].fillna(value, inplace=True) should be enough. – Can Sucuoglu Nov 10 '20 at 19:23
  • Please don't write code this way! Simply access the column directly `df1['First'] = df1['First'].fillna(method='ffill')` OR better: `df1['First'] = df1['First'].ffill()`. As a general rule of thumb only use `inplace=True` if you really have to. You may have some unforeseen issues in the future maybe not with this code but with other code if you use `inplace=True` frequently. – David Erickson Nov 10 '20 at 19:24
  • If you are using multiple columns, `loc` is fine. You could also do `df[list_columns] = df[list_columns].fillna()` – David Erickson Nov 10 '20 at 19:29
  • @DavidErickson Thanks for your comment. But, the last command `df[list_columns] = df[list_columns].fillna()` did not work successfully and the issue is still happening. Also, I am interested to use `inplace=True`. So, is there any other recommendation? – Qaddomi Obaid Nov 11 '20 at 10:04

2 Answers2

2

change

df1.loc[:,'First'] = df1.loc[:,'First'].fillna(method='ffill', inplace=True)

to

df1.loc[:,'First'].fillna(method='ffill', inplace=True)

this is because you are using inplace=True which means changes will be made to the original dataframe.

As for the None values, they come from the function returning None as it's inplace and there is nothing to return. Hence, all the values become None.


For each column,

for col in df1.columns:
    df1[col].fillna(10, inplace=True)
df1

PS: For the future user, -- avoid inplace because In pandas, is inplace = True considered harmful, or not?

ssp4all
  • 371
  • 2
  • 11
2

If you want to forward fill you can just do:

df1 = df1.ffill()

This results in:

    First   Second  Third
0   100.0   30.0    NaN
1   90.0    45.0    40.0
2   90.0    56.0    80.0
3   95.0    56.0    98.0

There's still one nan value, so we could do a backfill still:

df1 = df1.bfill()

Final result:

    First   Second  Third
0   100.0   30.0    40.0
1   90.0    45.0    40.0
2   90.0    56.0    80.0
3   95.0    56.0    98.0

If you only want to forward fill na's in specific columns, then use the following. Please note I am NOT using inplace=True. This was the reason why you're code wasn't working before.

columns_to_fillna = ['Second', 'Third']
df1.loc[:, columns_to_fillna] = df1.loc[:, columns_to_fillna].ffill()

If you really want to use inplace=True, which is not be advised, then do:

columns_to_fillna = ['Second', 'Third']
df1.loc[:, columns_to_fillna].ffill(inplace=True)

Reason why inplace is not advised, is discussed here:
https://stackoverflow.com/a/60020384/6366770

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
  • Thanks for your comment. It worked well, but what about filling the missing values in particular columns? – Qaddomi Obaid Nov 11 '20 at 09:49
  • I really appreciate your help. But, really, I need to use `inplace=True` because I need to reflect the effect directly to the data frame. I could use your solution, but I need an additional command to reflect these changes to the original data frame. If there is such a solution I will be thoughtful to you. – Qaddomi Obaid Nov 11 '20 at 10:11
  • 1
    @QaddomiObaid Please see this answer as to why you should try to avoid `inplace=True`: https://stackoverflow.com/a/60020384/6366770 – David Erickson Nov 11 '20 at 11:00
  • Thx @DavidErickson I updated my answer again, using your comments :) – Sander van den Oord Nov 11 '20 at 11:58
  • @DavidErickson Thanks a lot. I read the attached link and now I understand the issue. But, is there such a trick to reflect the changes to the original data frame instead using `inplace=True`? Maybe like using copy? I really need changes to take effect on the main data frame, because there are a bunch of subsequent changes that will be made to the main data frame. I am a beginner and I need an expert ecommendation. – Qaddomi Obaid Nov 11 '20 at 17:40
  • @QaddomiObaid there are several alternatives. For multiple columns, you can also do: `for col in list_columns: df1[col] = df1[col].ffill()` That would be my personal choice. Then, to display the results, just write `df1`. – David Erickson Nov 11 '20 at 18:09