0

I have a table of unique products and reviews:

ProductID  Comment
  1        Great product!
  2        Terrible
  2        Amazing!

The table (a csv) is about ~170,000 rows. I'm looking to retain 5% of comments for each unique ProductID. Is there a functionality in Pandas that will let me do this?

  • `Is there a functionality in Pandas that will let me do this?` - yes. – wwii Jul 10 '22 at 00:55
  • Does [Pandas - Group by one column and aggregate other column to list](https://stackoverflow.com/questions/65093644/pandas-group-by-one-column-and-aggregate-other-column-to-list) answer your question? – wwii Jul 10 '22 at 00:59

1 Answers1

0

you could use groupby with sample.

df.groupby('ProductID').sample(frac=.05)
Qdr
  • 703
  • 5
  • 13