python pandas - get unique count, description text sample (similar to mysql group_concat)

Question

say i have sample data like this

and i want to generate summary dataframe with sample text of the desc1 and desc2 for a large data set.. (about 20,000 rows)

I will have more columns like desc3, desc4, etc... and i may want to include additional desc_n samples in the result.

The purpose is to get an idea of what the unique names are (group by).. then see a sample text for the other fields concatenated and count of unique desc1

Reading [this](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on good pandas questions might be helpful. — DSM, Mar 13 '17 at 17:09

score 5 · Accepted Answer · answered Mar 13 '17 at 17:11

5

You can do something like this:

# customized aggregation function
join_unique = lambda x: ','.join(set(x))

# aggregate columns desc1 and desc2 respectively
df1 = df.groupby('name').agg({'desc1': [join_unique, 'nunique'], 'desc2': join_unique})

# rename columns
df1.columns = ['_'.join(x) if x[1] == 'nunique' else x[0] + "_samp" for x in df1.columns]

df1   # call reset_index() if necessary

answered Mar 13 '17 at 17:11

Psidom

209,562
33
339
356

excellent!!!... i need to practice on this more to get a good understanding to use with my real data. Thank you. – ihightower Mar 13 '17 at 17:20

python pandas - get unique count, description text sample (similar to mysql group_concat)

1 Answers1