0

I'm trying to reduce the data size so I came up with these conditions: For each month, I only want to randomly select 5 to 10 sale records of each dealer (dealers with unique ID). The Data looks like this:

   Date | Product  | Revenue | Dealer ID
Jan 7,18| XXX      | 10      | 1212
Jan 7,18| YYY      | 13      | 1212
Jan 7,18| XXX      | 20      | 2500
Jan 7,18| ZZZ      | 5       | 1212
....
Jan 8,18| ZZZ      | 15      | 1212
Jan 8,18| AAA      | 17      | 2500
Jan 8,18| MMM      | 9       | 1318
...

and the output of a dealer's January sale record should look like this:

   Date  | Product  | Revenue | Dealer ID
Jan 7,18 | XXX      | 10      | 1212
Jan 7,18 | ZZZ      | 5       | 1212
Jan 10,18| ZZZ      | 15      | 1212
Jan 17,18| AAA      | 17      | 1212
Jan 22,18| MMM      | 9       | 1212
Jan 27,18| ZZZ      | 15      | 1212
Jan 28,18| MMM      | 9       | 1212

...

I would write a nested for loop. for each dealer ID, for each month, randomly choose n number of entries. n being a random number from 5 to 10. I'm not quite sure how to loop through months, and can't seem to find a way to grab random entries..

Does anyone have an easier way to do this task? Here's my attempt:

unique_ID = np.unique(df['Dealer ID'].sort_values(ascending=True))
months = ["January", "Feburary", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
years = range(2018, 2022)
for y in years:
    for m in months:
        for i in unique_ID:
            if df['Dealer ID'] == i: 'have to loop through the file and pick out all the entries with that Dealer ID'
                'create a list to store them'
                'and then randomly select 8 entries from each of the dealer'
PiCubed
  • 375
  • 2
  • 5
  • 11
  • How is this data stored? In a file? In a dataframe? In a sql table? And what code have you tried so far? Please provide a[mcve] – G. Anderson Jan 21 '20 at 20:06
  • it's a dataframe. I'm still improvising my code i'll post it shortly – PiCubed Jan 21 '20 at 20:07
  • 1
    That's in important detail to [edit] into the original question. Might also be worth a quick look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a sample of what you want your output to look like – G. Anderson Jan 21 '20 at 20:10
  • @G.Anderson I have updated the code, that's the rough idea I have so far. But not sure how to make it work – PiCubed Jan 21 '20 at 21:24
  • So it does not work? If it does, it would better fit on [codereview.se] – Jongware Jan 21 '20 at 22:01
  • Does this answer your question? [Python: Random selection per group](https://stackoverflow.com/questions/22472213/python-random-selection-per-group) – G. Anderson Jan 22 '20 at 16:17
  • @G.Anderson this looks very similar to what i'm trying to do. I will give it a try – PiCubed Jan 23 '20 at 15:39

0 Answers0