1

I have a DataFrame df1

df1 = pd.DataFrame({
    "id": [1, 1, 2, 2, 3, 3],
    "text": ["a", "a", "b", "b", np.nan, np.nan],
    "value1": [2, np.nan, 6, np.nan, 7, np.nan],
    "value2": [np.nan, 8, np.nan, 1, np.nan, 9],
    "value3": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]
})

   id text  value1  value2  value3
0   1    a     2.0     NaN     NaN
1   1    a     NaN     8.0     NaN
2   2    b     6.0     NaN     NaN
3   2    b     NaN     1.0     NaN
4   3  NaN     7.0     NaN     NaN
5   3  NaN     NaN     9.0     NaN

My goal is to group df1 by id to get the following result:

   id text  value1  value2  value3
0   1    a       2       8     NaN
1   2    b       6       1     NaN
2   3  NaN       7       9     NaN

I tried the following code:

def f(x):
    y = x.dropna()
    return np.nan if y.empty else y

df1.groupby(["id"]).agg(f).reset_index()

but in the result the "text" column is missing because of the NaN.

   id  value1  value2  value3
0   1     2.0     8.0     NaN
1   2     6.0     1.0     NaN
2   3     7.0     9.0     NaN

How can I solve this?

comstack
  • 33
  • 2

2 Answers2

1

Just groupby id, call the first and reset the index.

df1.groupby('id').first().reset_index()

   id  text  value1  value2  value3
0   1     a     2.0     8.0     NaN
1   2     b     6.0     1.0     NaN
2   3  None     7.0     9.0     NaN
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
0

try:

out=df1.groupby('id',dropna=False,as_index=False).first()

output of out:

    id  text    value1  value2  value3
0   1   a       2.0     8.0     NaN
1   2   b       6.0     1.0     NaN
2   3   None    7.0     9.0     NaN
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41