Pandas - Merge rows in a DataFrame

Question

I'm trying to cleanup some data

The dataframe currently look something like this:

    id  data data2
0   12  NaN  50.0
1   12  a    50.0
2   12  a    NaN
3   52  b    NaN
4   52  NaN  20.0
5   52  NaN  20.0

I'd like to collapse the rows to remove duplicate entries and keep only what's valid grouping on ID in this specific case, disregarding NaNs and and up with:

    id  data data2
0   12  a    50
1   52  b    20

So what do you mean by "disregarding NaNs"? What are the duplicates here? Your question is a bit broad. — cs95, May 25 '18 at 18:39
You're right - I forgot to specify it should group using the ID column. I edited the question for clarification - thanks for pointing it out! — velxundussa, May 25 '18 at 18:41
It is indeed! I figured there must have been a way to do this that'd be simple, I did not expect /that/ simple though. — velxundussa, May 25 '18 at 18:45
Once you can explain the question, the solution presents itself. Good luck! — cs95, May 25 '18 at 18:54
@HarvIpan Not 100% sure, but it may be duplicate. Feel free to edit your answer with my comment, that's fine with me. — cs95, May 25 '18 at 18:55

harvpan · Accepted Answer · 2018-05-25T18:58:32.530

2

You need:

df.groupby('id', as_index=False).first()

Output:

    id  data    data2
0   12  a      50.0
1   52  b      20.0

edited May 25 '18 at 18:58

answered May 25 '18 at 18:43

harvpan

8,571
2
18
36

@coldspeed, thank you. – harvpan May 25 '18 at 18:58

Pandas - Merge rows in a DataFrame

1 Answers1