I'm working with Python 3.6.5.
Here is a little script to generate a multi index dataframe with some "NaN" value.
import pandas as pd
import numpy as np
att_1 = ['X', 'Y']
att_2 = ['a', 'b']
df_1 = pd.DataFrame(np.random.randint(10,19,size=(5, 2)), columns=att_2,
index=[10,20,30,35,40])
df_2 = pd.DataFrame(np.random.randint(20,29,size=(5, 2)), columns=att_2,
index=[20,25,40,50,80])
# Concat df with new key dimension for column attribute
df = pd.concat([df_1, df_2], keys=att_1, axis=1)
I get this dataframe
print(df)
X Y
a b a b
10 17.0 17.0 NaN NaN
20 15.0 11.0 20.0 28.0
25 NaN NaN 23.0 24.0
30 12.0 16.0 NaN NaN
35 10.0 10.0 NaN NaN
40 15.0 14.0 25.0 28.0
50 NaN NaN 22.0 22.0
80 NaN NaN 23.0 21.0
And I would like to replace the "NaN" value with the last valid value, BUT ONLY FOR ONE COLUMN. For example, I would like to get this (for column named 'X','b')
print(df)
X Y
a b a b
10 17.0 17.0 NaN NaN
20 15.0 11.0 20.0 28.0
25 NaN 11.0 23.0 24.0
30 12.0 16.0 NaN NaN
35 10.0 10.0 NaN NaN
40 15.0 14.0 25.0 28.0
50 NaN 14.0 22.0 22.0
80 NaN 14.0 23.0 21.0
I tried this :
# Replace NaN value by last valid value for column named 'X','b'
df['X']['b'].fillna(method='ffill', inplace=True)
But I get this error "A value is trying to be set on a copy of a slice from a DataFrame"
I can not find any solution for a dataframe with multi-index of column. I found this link that gives me no hope. (https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.MultiIndex.fillna.html)
Does anyone have an idea to help me?