1

If on column A I have the ID number of people, and column B I have their names, but I have many rows where the ID is entered but the name is missing, assuming, same person, same Id and supposedly same name.

How can I to locate null on column B, grab correspondent value on A(ID#), compare and find the name ID and them grab the name(value on B) and fill the null value? Like:

A B
56 Michael
34 Paula
79 Davi
80 Luna
56 NaN

So I want a code to identify the NaN, grab the A column value 56, search for another 56 in the same column and fill NaN with the correspondent value, imagine this dataframe is gigantic, so I can't just group by A and replace one by one

Shaido
  • 27,497
  • 23
  • 70
  • 73

1 Answers1

0

As said in the comment, ffill can work. But if NaN value appears first in the group, it won't change that NaN. So sort df by 'B' first and then do ffill like:

df.sort_values('B', ignore_index=True, inplace=True)
df['B'] = df.groupby('A')['B'].ffill()

Or you can find what is the real name of each id by dropping NaN and map it onto the 'A' column:

df['B'] = df['A'].map(df.groupby('A').agg(lambda x:x.dropna().unique())['B'])

Although ffill might be more performant without considering sort.

SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • I want to read what is written in your profile picture, but it's not very clear, can you please tell me what is that? – Sunderam Dubey Jul 08 '22 at 06:04
  • If the first value will be NaN, I think we can put it as `df.groupby('A')['B'].ffill().bfill()`? Save the hassle of sorting it –  Jul 08 '22 at 10:15
  • @KevinChoonLiangYew No that won't work, because after ffill it is not necessary that the similar 'A' are adjacent and hence bfill will actually fill the value of other 'A'. You can do bfill after another groupby. But I think sort and ffill is more performant. – SomeDude Jul 08 '22 at 12:58
  • @SunderamDubey it can be read if you try ;) – SomeDude Jul 08 '22 at 14:02