Pandas merging rows with the same value and same index with multiple empty values

Question

I am a newbie in pandas. I am trying the same thing as this answer is saying.

I have data like

SubjectID       Visit    Value1    Value2    Value3
                 1         1.57      1.75     1.56   
                 1         1.56       N/A     N/A 
    B1           2         N/A       1.56     N/A

and I want data like

SubjectID       Visit    Value1    Value2    Value3
                 1     1.57,1.56      1.75     1.56    
    B1           2         N/A       1.56     N/A

What I tried is

#1. Did not work    
df.groupby(['SubjectID', 'Visit'])['Value1']\
  .apply(lambda val: ','.join(val)).reset_index()

#2. did not work
df = df.groupby(['SubjectID', 'Visit'])['Value1']\
       .apply(lambda val: ','.join(val)).reset_index() 

#3. did not work
df.groupby(['SubjectID', 'Visit'])['Value1', 'Value2', 'Value3']\
  .apply(lambda v1, v2, v3 : ','.join(val1), ','.join(val2), ','.join(val3)).reset_index()

#4. did not work
df.groupby(['SubjectID', 'Visit'])\
       ['Value1'].apply(lambda v1:','.join(val1)) \
       ['Value2'].apply(lambda v2:','.join(val2)) \
       ['Value3'].apply(lambda v3:','.join(val3)).reset_index()

Any suggestions to get this working?

jezrael · Accepted Answer · 2019-08-17T12:24:01.683

3

Use GroupBy.agg with custom lambda function with remove missing values by Series.dropna and cast to strings and if all values are missing is returned np.nan:

f = lambda val: np.nan if val.isna().all() else ','.join(val.dropna().astype(str))
df = df.groupby(['SubjectID', 'Visit']).agg(f).reset_index()
print (df)
  SubjectID  Visit     Value1  Value2  Value3
0                1  1.57,1.56    1.75    1.56
1        B1      2        NaN    1.56     NaN

If need empty string instead missing values, solution is simplier:

f = lambda val: ','.join(val.dropna().astype(str))
df1 = df.groupby(['SubjectID', 'Visit']).agg(f).reset_index()
print (df1)
  SubjectID  Visit     Value1  Value2 Value3
0                1  1.57,1.56    1.75   1.56
1        B1      2               1.56

edited Aug 17 '19 at 12:24

answered Aug 17 '19 at 12:16

jezrael

822,522
95
1,334
1,252

I want the same on columns Value2 and Value3 also, do I add multiple `agg`? Please explain – Kishor Pawar Aug 17 '19 at 12:18
@KishorPawar - Yes, or if need processing all columns without specified in `groupby` like `['SubjectID', 'Visit']` only remove column after `groupby`, if need select multiple columns use `df.groupby(['SubjectID', 'Visit'])['Value1', 'Value2', 'Value3']` – jezrael Aug 17 '19 at 12:27
The above did not work, can you please suggest more? – Kishor Pawar Aug 17 '19 at 12:30
@KishorPawar - Can you be more specific `The above did not work` ? – jezrael Aug 17 '19 at 12:31
It did not make any changes to `df`, I did not merge. – Kishor Pawar Aug 17 '19 at 12:32
@KishorPawar - Ok, what is `print (df.dtypes)` ? – jezrael Aug 17 '19 at 12:32
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/198082/discussion-between-kishor-pawar-and-jezrael). – Kishor Pawar Aug 17 '19 at 12:36

Allen Qin · Answer 2 · 2019-08-17T12:41:02.903

1

Use groupby and join the values from same group:

(
    df.fillna('')
    .astype(str).groupby(by=['SubjectID', 'Visit'])
    .agg(lambda x: ','.join(x.loc[x!='']))
    .reset_index()
)

    SubjectID   Visit   Value1      Value2  Value3
0               1       1.57,1.56   1.75    1.56
1   B1          1                   1.56

edited Aug 17 '19 at 12:41

answered Aug 17 '19 at 12:27

Allen Qin

19,507
8
51
67

I do not want to group only by `SubjectID`. – Kishor Pawar Aug 17 '19 at 12:35

Pandas merging rows with the same value and same index with multiple empty values

2 Answers2