For example, if I have a column called companyId
and many other columns I want to keep, and in companyId
I have values like 100
, 101
, 102
, ..., basically a list of Ids and each Id appear different number of times. How do I randomly sample data based on the companyId
column so that it's according to the proportion of each Id?
eg: If I have 500 rows and 100 companyA
, 100 companyB
and 300 companyC
and I want to sample 100 rows from this table. How do I make my data have 20 companyA
, 20 companyB
and 60 companyC
?