0

say i have sample data like this

enter image description here

and i want to generate summary dataframe with sample text of the desc1 and desc2 for a large data set.. (about 20,000 rows)

I will have more columns like desc3, desc4, etc... and i may want to include additional desc_n samples in the result.

enter image description here

The purpose is to get an idea of what the unique names are (group by).. then see a sample text for the other fields concatenated and count of unique desc1

ihightower
  • 3,093
  • 6
  • 34
  • 49
  • 2
    Reading [this](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on good pandas questions might be helpful. – DSM Mar 13 '17 at 17:09

1 Answers1

5

You can do something like this:

# customized aggregation function
join_unique = lambda x: ','.join(set(x))

# aggregate columns desc1 and desc2 respectively
df1 = df.groupby('name').agg({'desc1': [join_unique, 'nunique'], 'desc2': join_unique})

# rename columns
df1.columns = ['_'.join(x) if x[1] == 'nunique' else x[0] + "_samp" for x in df1.columns]

df1   # call reset_index() if necessary

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • excellent!!!... i need to practice on this more to get a good understanding to use with my real data. Thank you. – ihightower Mar 13 '17 at 17:20