Pandas: retain n% of data for unique values?

Question

I have a table of unique products and reviews:

ProductID  Comment
  1        Great product!
  2        Terrible
  2        Amazing!

The table (a csv) is about ~170,000 rows. I'm looking to retain 5% of comments for each unique ProductID. Is there a functionality in Pandas that will let me do this?

`Is there a functionality in Pandas that will let me do this?` - yes. — wwii, Jul 10 '22 at 00:55
Does [Pandas - Group by one column and aggregate other column to list](https://stackoverflow.com/questions/65093644/pandas-group-by-one-column-and-aggregate-other-column-to-list) answer your question? — wwii, Jul 10 '22 at 00:59

score 0 · Accepted Answer · answered Jul 10 '22 at 01:26

0

you could use groupby with sample.

df.groupby('ProductID').sample(frac=.05)

answered Jul 10 '22 at 01:26

Qdr

703
5
13

Pandas: retain n% of data for unique values?

1 Answers1