Combine rows of pandas df with the same index in the same df

Question

If I have a df that looks something like this:

               v1     v2         ...   v10       v11
id                               ...                            
102717.0   101234650  2018-08-27  ...   NaN       NaN
102717.0   101234650  2018-08-27  ...  UDMS    27/08/2018
102717.0   101234650  2018-08-27  ...   NaN       NaN
102717.0   101234650  2018-08-27  ...  UDMS    27/08/2018

So when the id col matches how could I combine these to just 1 row?

Desired output would be something like:

               v1     v2         ...   v10       v11
id                               ...                            
102717.0   101234650  2018-08-27  ...  UDMS    27/08/2018

So the script would check for all values across each row that are repeated and then reduce it down filling any NaN values...

Will that combine the columns on the groupby? - surely this will mean I lose some data ? — Bob, Mar 23 '20 at 15:36
Can you provide the desired output to understand how to combine ? — Alexandre B., Mar 23 '20 at 15:44
@Bob your question is ambiguous. You need to be as explicit as you can in regards to how you expect the rows to be "combined". I attempted to give some help but I can't devote too much time into guessing what you mean to happen. Clarify your question and people will be able to help more. — piRSquared, Mar 23 '20 at 15:47
@piRSquared: I appreciate your suggestion, I was trying to open a discussion with you about it. Updated my question with some form of output - hope that helps. — Bob, Mar 23 '20 at 15:49
In regards to losing data... yes, you'll lose data. However, if you intend to preserve data, show an example of data that you'd like to preserve and how you'd like it to be presented or preserved. — piRSquared, Mar 23 '20 at 15:56
This topic might help [Python / Pandas: How to merge rows in dataframe](https://stackoverflow.com/questions/45163159/python-pandas-how-to-merge-rows-in-dataframe) — Alexandre B., Mar 23 '20 at 16:02
@AlexandreB.Thank you. I think I want some sort of concat on index maybe... if the df has repeated rows, reduce it down by checking the cols with NaN in and populating — Bob, Mar 23 '20 at 16:06

score 0 · Answer 1 · answered Mar 23 '20 at 16:07

0

It really depends on what your results should look like. E.g. does v2 always contain the same date for the corresponding id? From what I guess you're trying to do I'd do the following:

mean_dict = dict((el, np.nanmean) for el in df.columns)
newdf = df.groupby('id').agg(mean_dict)

I hope that helps. With more detailed information of your input and desired output we might be able to help you better.

answered Mar 23 '20 at 16:07

TiTo

833
2
7
28

Thank you. The data that is repeated should all be the same, but some rows will return NaN in cols and the other rows should be able to fill it... if that makes sense. So repeated rows should fill gaps of other rows. – Bob Mar 23 '20 at 16:10
Ok than try `df.groupby('id').apply(pd.Series.first_valid_index)` – TiTo Mar 23 '20 at 16:19
Ran this and it creates some Series with index, but lost rest of the df – Bob Mar 23 '20 at 16:23

Combine rows of pandas df with the same index in the same df

1 Answers1