I have a DataFrame df1
df1 = pd.DataFrame({
"id": [1, 1, 2, 2, 3, 3],
"text": ["a", "a", "b", "b", np.nan, np.nan],
"value1": [2, np.nan, 6, np.nan, 7, np.nan],
"value2": [np.nan, 8, np.nan, 1, np.nan, 9],
"value3": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]
})
id text value1 value2 value3
0 1 a 2.0 NaN NaN
1 1 a NaN 8.0 NaN
2 2 b 6.0 NaN NaN
3 2 b NaN 1.0 NaN
4 3 NaN 7.0 NaN NaN
5 3 NaN NaN 9.0 NaN
My goal is to group df1
by id
to get the following result:
id text value1 value2 value3
0 1 a 2 8 NaN
1 2 b 6 1 NaN
2 3 NaN 7 9 NaN
I tried the following code:
def f(x):
y = x.dropna()
return np.nan if y.empty else y
df1.groupby(["id"]).agg(f).reset_index()
but in the result the "text" column is missing because of the NaN.
id value1 value2 value3
0 1 2.0 8.0 NaN
1 2 6.0 1.0 NaN
2 3 7.0 9.0 NaN
How can I solve this?