0

edit: rewording

I have a list of accounts: accountdf <-

Account Owner   Value
Acc1    Jeff    500
Acc2    Jeff    100
Acc3    Jeff    200
Acc4    Jeff    700
Acc5    Matt    400
Acc6    Matt    500
Acc7    Matt    600
Acc8    Matt    20
Acc9    Matt    80
Acc10   Matt    1400
Acc11   Cindy   50
Acc12   Cindy   50
Acc13   Cindy   150
Acc14   Cindy   600
Acc15   Cindy   600
Acc16   Cindy   1350
Acc17   Cindy   800
Acc18   Cindy   400
Acc19   Sarah   1500
Acc20   Sarah   3000
Acc21           1000
Acc22           400
Acc23           500
Acc24           800
Acc25           900

When I summarise this list I get this overview of count of accounts and total value of accounts

namesummarydf <-

Name    Accounts    Value
Jeff    4           1500
Matt    6           3000
Cindy   8           4000
Sarah   2           3500
Mark    0           0

I have added Mark because he will now be an account owner, but he currently does not own any accounts.

The first step is to look at the average of accounts that everyone should have. There are 25 accounts, and 5 owners so everyone should have 5 accounts. The value total is 15,600, and there are 5 owners so everyone should have around 3,120 in account value. This is an estimation, so I'm fine with it being off by a bit. (In my actual data I have 300 owners and 80,000 accounts, some of which are owned and others which aren't)

Since everyone should have 5 accounts, we need to take accounts away from people that own too many accounts AND/OR too much value (over 3120). Matt has 6 accounts worth 3000 and Cindy has 8 worth 4000, and Sarah has 2 accounts that are worth 3500.

What I was trying to do was a for-loop that iterated through each of the names in the summary list, and looked to see if they had more than average accounts or more than average value. If yes, then look through the accounts list for only accounts in their name, and randomly select one of them within a quartile range of their own value. This random selection would be to specifically take away large value accounts, medium value or low value, not completely random. I wanted to take a sample of accounts between say the 20% percentile and 40% percentile in order to keep the high value accounts intact. the percentile range would be $$ of each account.

If someone had accounts ranging from $50 to $10000, then 20% and 40% would be around the $2,000 - $4,000 range for individual account value.

The while loop would remove one random account and then look back at the new count, and total value of that person's name (it would now be 1 account less, and say $2,000 in value less). It would then see if it was still above average and continually remove accounts until it hit the threshold. It would do this for everyone so it would systematically remove accounts from people's names until they had fewer accounts and few $$.

The next step which I have figured out already is a for loop that iterates through the accounts list on accounts that have no owner and assigns them to the person with the lowest value associated to their name.

My trouble with the while loop is that I wanted it to look and see if the person has MORE than 5 accounts, OR, MORE than average $$s. Sarah for example has 2 accounts but has 3500. One high value account should be removed from her list so she has room to then get assigned smaller accounts to fill up her quota of 5 accounts.

for (p in 1:nrow(namesummarydf)){

  nameidx <- namesummarydf$Name[p]

  while (namesummarydf$count[p] > mean(namesummarydf$count) | namesummarydf$Value[p] > mean(namesummarydf$Value)){
    sample(accountdf , which(accountdf$Owner == nameidx & 
                             (accountdf$totalnewcovalue > quantile(accountdf$Value, prob = 0.15) &
                                accountdf$totalnewcovalue < quantile(accountdf $Value, prob = 0.45))), 1)
  }
}
Matt W.
  • 3,692
  • 2
  • 23
  • 46
  • Could you please articulate a question? And share some reproducible data? Those dollar signs and commas in your numbers are just getting in the way. You will get help much more quickly if you give copy-pasteable example input and the desired output for that input. [See here for tips on sharing data reproducibly](http://stackoverflow.com/q/5963269/903061): use `dput` or share code to simulate it. – Gregor Thomas Feb 03 '17 at 19:51
  • Also please either delete the `while` loop at the top or elaborate on why/how you "know its wrong". Also comment on differences between the first while loop and the while loop you have at the bottom. The first one uses `|`, but the real one uses `&`, which makes them pretty different. – Gregor Thomas Feb 03 '17 at 19:52
  • You do not need looping for this. Vectorized `merge`, `match`, or `aggregate` with `ifelse` may work. It looks like you are checking against means and quantiles of the *count* and *value* columns, but what you mean by *unassign* is not clear. – Parfait Feb 03 '17 at 20:13
  • @Gregor Let me clean this up and make it more understandable - sorry for the opacity. – Matt W. Feb 03 '17 at 21:11
  • In your revision, please be specific in words about what "too many accounts" or "too high value accounts" mean. Currently, your loop looks at the current value of the average, and anyone above that is too high. This isn't a good criteria because there will *always* be at least one account above the average value unless all account values are the same. If account values are 900, 901, 901, 901, the average is 900.75 and all the 901 accounts would be eliminated. – Gregor Thomas Feb 03 '17 at 21:30
  • @Gregor So in my data I may have 15 names of people each with 50 accounts to their name, and another 15 people with 0 accounts. And then there is a pool of 10,000 accounts of all different values. If we're looking at the territory we have 30 people, ~11,000 accounts, each person should have roughly 367 accounts, and each person's territory value size should be roughly even as well. The problem is that in the actual data, I have some people who already have 400 accounts, and others with 100, or 0. I want to remove some of the 400 accounts so their territory shrinks down. – Matt W. Feb 03 '17 at 22:12
  • @Gregor I edited my whole question - please let me know if I'm still unclear! – Matt W. Feb 03 '17 at 22:12
  • @Parfait what I meant by unassign -- I tried to make it more clear in my edited version of my question above. Let me know if I addressed it, if not I can go into more detail or clear some things up. – Matt W. Feb 03 '17 at 22:14
  • @Gregor to address your previous question further, the average will be the average # of accounts and total value of accounts that each person should have. There are people who have 0 accounts and $0 value, and there is also a large pool of accounts that are unassigned already. This makes the average size a lot bigger, but still there are some that have too many accounts and too large total value of accounts. – Matt W. Feb 03 '17 at 22:16

1 Answers1

0

Please try to utilise this pseudocode :

df1<-data.frame(
N       Q     M1
Sarah   44    $110,000
Jeff    111   $541,000
Cameron 46    $201,000 
Matt    0     $0
Cindy   0     $0
)

df2<-data.frame(
 A       N        M2
 Acc1    Sarah    $1,200
 Acc2    Sarah    $900
 Acc3    Jeff     $700
 Acc4    Cameron  $880
 Acc5    Jeff     $4,500
)

merge(df1, df2)

it will create one data.frame merging those two by column N(the only column in both dataframes)

Qbik
  • 5,885
  • 14
  • 62
  • 93
  • I think my phrasing was a bit off. I'm not trying to merge the data but iterate through the name list (your df1), and look at all the individual accounts(your df2) and turn the name (your N column) into NA value if their score is too high. – Matt W. Feb 04 '17 at 15:40