Pandas Grouping by Id and getting non-NaN values

Question

I have a table that tracks changes made to each field for a salesforce record. My goal here is to group by the saleforce_id column and merge all the rows into one replacing null values with text values if there are any text values. I've tried different variations of groupby but can't seem to get the desired output.

Please provide a [reproducible minimal example](https://stackoverflow.com/q/20109391/8107362). Especially, provide some [sample data](https://stackoverflow.com/q/22418895/8107362), e.g. with `print(df.to_dict())`. A picture of your data is not easy to digest. — mnist, Nov 20 '21 at 20:49
`df1 = df.groupby('salesforce_id', as_index=False).first()` based on [this answer](https://stackoverflow.com/a/59048573/15497888) by [jezrael](https://stackoverflow.com/users/2901002/jezrael)? — Henry Ecker, Nov 20 '21 at 20:58

score 1 · Accepted Answer · 2021-11-20T21:38:50.260

1

This should do what you what:

df.groupby('salesforce_id').first().reset_index(drop=True)

That will merge all the columns into one, keeping only the non-NaN value for each run (unless there are no non-NaN values in all the columns for that row; then the value in the final merged column will be NaN).

edited Nov 20 '21 at 21:38

answered Nov 20 '21 at 21:01

Corralien · Answer 2 · 2021-11-20T21:21:11.547

0

Use melt and pivot:

out = df.melt('id').dropna() \
        .pivot('id', 'variable', 'value') \
        .rename_axis(index=None, columns=None)
print(out)

# Output:
    A   B   C
1  A1  B1  C2

Setup:

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': [1, 1, 1],
                   'A': ['A1', np.nan, np.nan],
                   'B': [np.nan, 'B1', np.nan],
                   'C': [np.nan, np.nan, 'C2'],
                   'D': [np.nan, np.nan, np.nan]})
print(df)

# Output:
   id    A    B    C   D
0   1   A1  NaN  NaN NaN
1   1  NaN   B1  NaN NaN
2   1  NaN  NaN   C2 NaN

edited Nov 20 '21 at 21:21

answered Nov 20 '21 at 20:58

Corralien

109,409
8
28
52

I think [jezrael answer](https://stackoverflow.com/a/59048573/15497888)'s is better for this `df.groupby('id', as_index=False).first()` if this is, in fact, what OP is looking for. – Henry Ecker Nov 20 '21 at 21:00
@HenryEcker. What do you think about *if there are any text values* in the question. I understand the OP doesn't want the column if there is no text value. If I'm right the answer of @jezrael is incomplete in this case (missing `.dropna(axis=1, how='all')`) – Corralien Nov 20 '21 at 21:09
I don't see any indication that they want to drop columns that are entirely NaN. It looks that they want a text value from each column _should one exist_ otherwise keep the column as NaN. If the question does _not_ answer the question OP can edit the question and ping me why the answer does not work and it can be reopened. – Henry Ecker Nov 20 '21 at 21:12

Pandas Grouping by Id and getting non-NaN values

2 Answers2