-1

I've this issue with this dataframe Below the code

import numpy as np
import pandas as pd
from numpy import nan
tostk = np.asarray([['A', nan, 6.0, nan, nan],
       ['A', 3.0, nan, nan, nan],
       ['A', nan, nan, 9.0, nan],
       ['A', nan, 5.0, nan, nan],
       ['A', nan, nan, nan, 7.0],
       ['B', nan, 8.0, nan, 7.0],
       ['B', nan, nan, 6.0, nan],
       ['B', 6.0, nan, nan, 8.0],
       ['B', 5.0, nan, nan, 6.0],
       ['B', nan, nan, 4.0, nan]])
pd.DataFrame(tostk)

I need to replace the nan values for each category (A and B) with the first value So I tried bfill but the problem with "bfill" is if the value belongs to category B it will fill the values in category A

Expected Result

res = np.asarray([['A', 3.0, 6.0, 9.0, 7.0],
           ['A', 3.0, 5.0, 9.0, 7.0],
           ['A', nan, 5.0, 9.0, 7.0],
           ['A', nan, 5.0, nan, 7.0],
           ['A', nan, nan, nan, 7.0],
           ['B', 6.0, 8.0, 6.0, 7.0],
           ['B', 6.0, nan, 6.0, 8.0],
           ['B', 6.0, nan, 4.0, 8.0],
           ['B', 5.0, nan, 4.0, 6.0],
           ['B', nan, nan, 4.0, nan]])
    pd.DataFrame(res)

Any ideas are welcome

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
Carlos Carvalho
  • 131
  • 1
  • 8

1 Answers1

0

I found that the linked duplicate worked, with two wrinkles:

  1. The last 4 columns are object type, and it seems like Pandas won't detect np.nan as a NA value if it's in object type. I had to convert to float.
  2. The solution linked dropped the group labels, which is clearly not what you want.

I did the same set up code as you:

import numpy as np
import pandas as pd
from numpy import nan
tostk = np.asarray([['A', nan, 6.0, nan, nan],
       ['A', 3.0, nan, nan, nan],
       ['A', nan, nan, 9.0, nan],
       ['A', nan, 5.0, nan, nan],
       ['A', nan, nan, nan, 7.0],
       ['B', nan, 8.0, nan, 7.0],
       ['B', nan, nan, 6.0, nan],
       ['B', 6.0, nan, nan, 8.0],
       ['B', 5.0, nan, nan, 6.0],
       ['B', nan, nan, 4.0, nan]])
df = pd.DataFrame(tostk)

Then convert to float:

df.loc[:, 1:4] = df.loc[:, 1:4].astype(float)

Then do the backfilling:

print(df.groupby(0).apply(lambda x: x.fillna(method='bfill')))

Output:

   0    1    2    3    4
0  A  3.0  6.0  9.0  7.0
1  A  3.0  5.0  9.0  7.0
2  A  NaN  5.0  9.0  7.0
3  A  NaN  5.0  NaN  7.0
4  A  NaN  NaN  NaN  7.0
5  B  6.0  8.0  6.0  7.0
6  B  6.0  NaN  6.0  8.0
7  B  6.0  NaN  4.0  8.0
8  B  5.0  NaN  4.0  6.0
9  B  NaN  NaN  4.0  NaN
Nick ODell
  • 15,465
  • 3
  • 32
  • 66