1

I have a set of data with 50K records of users (by email) and I need to choose only 10K of those records, by a predefined ratio of values in each category: Region, Role and Position.

For example, if I have the following sample of data (11 rows) how can I subset it to get 5 rows, split the following way:
- 80% AMER, 20% INDIA
- For each Role have 60% Sales and the rest would be at random
- For Position, get a split of 20% being Managers and 80% being Operational

Email             Geo    Role       Position
abs@example.com   AMER   Sales      Manager
sdf@example.com   AMER   Sales      Operational
dsfe@example.com  EMEA   Sales      Manager
sdw@example.com   AMER   Sales      Operational
aydje@example.com EMEA   Sales      Manager
fdsed@example.com AMER   Testing    Operational
Sfe@example.com   AMER   Testing    Manager
dfgt@example.com  INDIA  Testing    Manager
gsdr@example.com  INDIA  Testing    Operational
dmgru@example.com AMER   Marketing  Operational
edr@example.com   INDIA  Marketing  Operational

I expect to get something like this:

Email             Geo    Role       Position
abs@example.com   AMER   Sales      Manager
sdf@example.com   AMER   Sales      Operational
sdw@example.com   AMER   Sales      Operational
fdsed@example.com AMER   Testing    Operational
edr@example.com   INDIA  Marketing  Operational

I'm aware that there will be more than one right solution, especially with more data, but any one is fine, as long as the predefined ratios are respected.

Cettt
  • 11,460
  • 7
  • 35
  • 58
az_s
  • 11
  • 1

0 Answers0