I have a table that tracks changes made to each field for a salesforce record. My goal here is to group by the saleforce_id column and merge all the rows into one replacing null values with text values if there are any text values. I've tried different variations of groupby but can't seem to get the desired output.
Asked
Active
Viewed 711 times
-1
-
Please provide a [reproducible minimal example](https://stackoverflow.com/q/20109391/8107362). Especially, provide some [sample data](https://stackoverflow.com/q/22418895/8107362), e.g. with `print(df.to_dict())`. A picture of your data is not easy to digest. – mnist Nov 20 '21 at 20:49
-
`df1 = df.groupby('salesforce_id', as_index=False).first()` based on [this answer](https://stackoverflow.com/a/59048573/15497888) by [jezrael](https://stackoverflow.com/users/2901002/jezrael)? – Henry Ecker Nov 20 '21 at 20:58
2 Answers
1
This should do what you what:
df.groupby('salesforce_id').first().reset_index(drop=True)
That will merge all the columns into one, keeping only the non-NaN value for each run (unless there are no non-NaN values in all the columns for that row; then the value in the final merged column will be NaN).
0
Use melt
and pivot
:
out = df.melt('id').dropna() \
.pivot('id', 'variable', 'value') \
.rename_axis(index=None, columns=None)
print(out)
# Output:
A B C
1 A1 B1 C2
Setup:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': [1, 1, 1],
'A': ['A1', np.nan, np.nan],
'B': [np.nan, 'B1', np.nan],
'C': [np.nan, np.nan, 'C2'],
'D': [np.nan, np.nan, np.nan]})
print(df)
# Output:
id A B C D
0 1 A1 NaN NaN NaN
1 1 NaN B1 NaN NaN
2 1 NaN NaN C2 NaN

Corralien
- 109,409
- 8
- 28
- 52
-
I think [jezrael answer](https://stackoverflow.com/a/59048573/15497888)'s is better for this `df.groupby('id', as_index=False).first()` if this is, in fact, what OP is looking for. – Henry Ecker Nov 20 '21 at 21:00
-
@HenryEcker. What do you think about *if there are any text values* in the question. I understand the OP doesn't want the column if there is no text value. If I'm right the answer of @jezrael is incomplete in this case (missing `.dropna(axis=1, how='all')`) – Corralien Nov 20 '21 at 21:09
-
I don't see any indication that they want to drop columns that are entirely NaN. It looks that they want a text value from each column _should one exist_ otherwise keep the column as NaN. If the question does _not_ answer the question OP can edit the question and ping me why the answer does not work and it can be reopened. – Henry Ecker Nov 20 '21 at 21:12