I'm trying to reduce the data size so I came up with these conditions: For each month, I only want to randomly select 5 to 10 sale records of each dealer (dealers with unique ID). The Data looks like this:
Date | Product | Revenue | Dealer ID
Jan 7,18| XXX | 10 | 1212
Jan 7,18| YYY | 13 | 1212
Jan 7,18| XXX | 20 | 2500
Jan 7,18| ZZZ | 5 | 1212
....
Jan 8,18| ZZZ | 15 | 1212
Jan 8,18| AAA | 17 | 2500
Jan 8,18| MMM | 9 | 1318
...
and the output of a dealer's January sale record should look like this:
Date | Product | Revenue | Dealer ID
Jan 7,18 | XXX | 10 | 1212
Jan 7,18 | ZZZ | 5 | 1212
Jan 10,18| ZZZ | 15 | 1212
Jan 17,18| AAA | 17 | 1212
Jan 22,18| MMM | 9 | 1212
Jan 27,18| ZZZ | 15 | 1212
Jan 28,18| MMM | 9 | 1212
...
I would write a nested for loop. for each dealer ID, for each month, randomly choose n number of entries. n being a random number from 5 to 10. I'm not quite sure how to loop through months, and can't seem to find a way to grab random entries..
Does anyone have an easier way to do this task? Here's my attempt:
unique_ID = np.unique(df['Dealer ID'].sort_values(ascending=True))
months = ["January", "Feburary", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
years = range(2018, 2022)
for y in years:
for m in months:
for i in unique_ID:
if df['Dealer ID'] == i: 'have to loop through the file and pick out all the entries with that Dealer ID'
'create a list to store them'
'and then randomly select 8 entries from each of the dealer'