-1
     A1   A2  A3   B1   B2   B3
aa  1.0  4.0   9  NaN  NaN  2.0
bb  NaN  NaN   9  9.0  2.0  2.0
cc  3.0  2.0   7  1.0  3.0  2.0
dd  4.0  5.0   7  NaN  4.0  5.0
ee  5.0  NaN   1  5.0  1.0  5.0
ff  3.0  5.0   2  5.0  2.0  NaN

I need to fill the missing values (Na) based on the following parameters:

  1. if the row(aa, bb, cc, dd, ff) has a missing value for either of columns(A1, A2, A3 or B1, B2, B3), then fill the missing value with the mean of the values of the other two rows of either (A1-3 or B1-3) example: (dd:B1) should be (5+4)/2 = 4.5

  2. if the rows have two missing values belong to either (A1-A3 or B1-B3) then fill it with the available value in the third column of the category(A or B) example: (bb:A1 and bb: A2) == 9 as (bb: A3 is 9)


Expected output:

     A1   A2   A3   B1   B2   B3
aa  1.0  4.0  9.0  2.0  2.0  2.0
bb  9.0  9.0  9.0  9.0  2.0  2.0
cc  3.0  2.0  7.0  1.0  3.0  2.0
dd  4.0  5.0  7.0  4.5  4.0  5.0
ee  5.0  3.0  1.0  5.0  1.0  5.0
ff  3.0  5.0  2.0  5.0  2.0  3.5
ansev
  • 30,322
  • 5
  • 17
  • 31
sp29
  • 363
  • 4
  • 11
  • 2
    Could please add the output as code instead of image, followed by the desired output if posible? – Celius Stingher Sep 16 '19 at 20:10
  • Please look into https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples on how to make good examples – niraj Sep 16 '19 at 20:17

2 Answers2

1

Here is one-way:

df.replace({'Na':np.nan}).T\
  .groupby(df.columns.str[0], group_keys=False)\
  .apply(lambda x: x.fillna(x.mean())).T

Output:

     A1   A2   A3   B1   B2   B3
aa  1.0  4.0  9.0  2.0  2.0  2.0
bb  9.0  9.0  9.0  9.0  2.0  2.0
cc  3.0  2.0  7.0  1.0  3.0  2.0
dd  4.0  5.0  7.0  4.5  4.0  5.0
ee  5.0  3.0  1.0  5.0  1.0  5.0
ff  3.0  5.0  2.0  5.0  2.0  3.5

Details:

First, in the original dataframe, there were string 'Na' instead of np.nan. Use replace with a dictionary to change 'Na' ton np.Nan, we we can use fillna.

Transpose the dataframe such that columns become rows and rows becomes columns, it is easier and you have more functionality with grouping on rows instead of columns. Use, the first letter of the column header to group on by using the .str accessor and slicing position 0. Next, we ca use fillna a with the mean of the x values.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

first you maybe need:

df=df.replace({'Na':np.nan})

now df is:

     A1   A2  A3   B1   B2   B3
aa  1.0  4.0   9  NaN  NaN  2.0
bb  NaN  NaN   9  9.0  2.0  2.0
cc  3.0  2.0   7  1.0  3.0  2.0
dd  4.0  5.0   7  NaN  4.0  5.0
ee  5.0  NaN   1  5.0  1.0  5.0
ff  3.0  5.0   2  5.0  2.0  NaN

Then you can use:

c=df.columns.str.contains('A')
df2=df.T.groupby(c).mean().T
df2.columns=['B','A']
df=pd.concat([df.T[c].fillna(df2['A']),df.T[~c].fillna(df2['B'])]).T
print(df)


     A1   A2   A3   B1   B2   B3
aa  1.0  4.0  9.0  2.0  2.0  2.0
bb  9.0  9.0  9.0  9.0  2.0  2.0
cc  3.0  2.0  7.0  1.0  3.0  2.0
dd  4.0  5.0  7.0  4.5  4.0  5.0
ee  5.0  3.0  1.0  5.0  1.0  5.0
ff  3.0  5.0  2.0  5.0  2.0  3.5
ansev
  • 30,322
  • 5
  • 17
  • 31