edit: rewording
I have a list of accounts: accountdf <-
Account Owner Value
Acc1 Jeff 500
Acc2 Jeff 100
Acc3 Jeff 200
Acc4 Jeff 700
Acc5 Matt 400
Acc6 Matt 500
Acc7 Matt 600
Acc8 Matt 20
Acc9 Matt 80
Acc10 Matt 1400
Acc11 Cindy 50
Acc12 Cindy 50
Acc13 Cindy 150
Acc14 Cindy 600
Acc15 Cindy 600
Acc16 Cindy 1350
Acc17 Cindy 800
Acc18 Cindy 400
Acc19 Sarah 1500
Acc20 Sarah 3000
Acc21 1000
Acc22 400
Acc23 500
Acc24 800
Acc25 900
When I summarise this list I get this overview of count of accounts and total value of accounts
namesummarydf <-
Name Accounts Value
Jeff 4 1500
Matt 6 3000
Cindy 8 4000
Sarah 2 3500
Mark 0 0
I have added Mark because he will now be an account owner, but he currently does not own any accounts.
The first step is to look at the average of accounts that everyone should have. There are 25 accounts, and 5 owners so everyone should have 5 accounts. The value total is 15,600, and there are 5 owners so everyone should have around 3,120 in account value. This is an estimation, so I'm fine with it being off by a bit. (In my actual data I have 300 owners and 80,000 accounts, some of which are owned and others which aren't)
Since everyone should have 5 accounts, we need to take accounts away from people that own too many accounts AND/OR too much value (over 3120). Matt has 6 accounts worth 3000 and Cindy has 8 worth 4000, and Sarah has 2 accounts that are worth 3500.
What I was trying to do was a for-loop that iterated through each of the names in the summary list, and looked to see if they had more than average accounts or more than average value. If yes, then look through the accounts list for only accounts in their name, and randomly select one of them within a quartile range of their own value. This random selection would be to specifically take away large value accounts, medium value or low value, not completely random. I wanted to take a sample of accounts between say the 20% percentile and 40% percentile in order to keep the high value accounts intact. the percentile range would be $$ of each account.
If someone had accounts ranging from $50 to $10000, then 20% and 40% would be around the $2,000 - $4,000 range for individual account value.
The while loop would remove one random account and then look back at the new count, and total value of that person's name (it would now be 1 account less, and say $2,000 in value less). It would then see if it was still above average and continually remove accounts until it hit the threshold. It would do this for everyone so it would systematically remove accounts from people's names until they had fewer accounts and few $$.
The next step which I have figured out already is a for loop that iterates through the accounts list on accounts that have no owner and assigns them to the person with the lowest value associated to their name.
My trouble with the while loop is that I wanted it to look and see if the person has MORE than 5 accounts, OR, MORE than average $$s. Sarah for example has 2 accounts but has 3500. One high value account should be removed from her list so she has room to then get assigned smaller accounts to fill up her quota of 5 accounts.
for (p in 1:nrow(namesummarydf)){
nameidx <- namesummarydf$Name[p]
while (namesummarydf$count[p] > mean(namesummarydf$count) | namesummarydf$Value[p] > mean(namesummarydf$Value)){
sample(accountdf , which(accountdf$Owner == nameidx &
(accountdf$totalnewcovalue > quantile(accountdf$Value, prob = 0.15) &
accountdf$totalnewcovalue < quantile(accountdf $Value, prob = 0.45))), 1)
}
}