1

I have a sales dataframe with customer data and sales team. I have a target calls based on the individual customer which I want to split across the sales teams

 cust_id| total_calls_req| group_1_rep| group_2_rep| group_3_rep
 34523  |    10          | 230429     | nan        | 583985
 34583  |    12          | 230429     | 539409     | 583985
 34455  |    6           | 135552     | nan        | nan

I want to create a function that splits the total_calls_req across each group based on whether or not there is a group_rep assigned.

If cust_id is assigned to 1 rep then the total_calls_req is all assigned to the rep in question

If cust_id is assigned to 2 reps then the total_calls_req is split between the two reps in question.

If cust_id is assigned to 3 reps then the total_calls_req is split randomly between the three reps in question and needs to be whole cards.

I want the end dataframe to look like this:

 cust_id| total_calls_req| group_1_rep| group_2_rep| group_3_rep| group_1_rep_calls| group_2_rep_calls| group_3_rep_calls
 34523  |    10          | 230429     | nan        | 583985     |   5              | 0                |   5
 34583  |    12          | 230429     | 539409     | 583985     |   6              | 3                |   3
 34455  |    6           | 135552     | nan        | nan        |   6              | 0                |   0

Is there a way I can do that through a python function?

dsexplorer
  • 105
  • 8
  • So the case with 3 reps and the random split means any combination of calls that add up to 12 (in the example)? You show 6-3-3. Could it be 10-1-1 or 0-4-8, etc? And what if the 2-rep case has an odd number of calls? – jch Jul 19 '21 at 18:14
  • @jch Yes it can be any random combination. If it is odd then you want split to either but it needs to be whole numbers adding up to the total – dsexplorer Jul 19 '21 at 19:48
  • @dsexplorer any feedback on the answers? Has one of them solved your problem? – mozway Jul 25 '21 at 11:40

2 Answers2

0

You can use this custom split function to split the calls among reps that are assigned to the customer. This uses the columns starting in "group_" to identify the assigned reps and count how many they are. In case they are more than two, the numpy.random.multinomial function enables to generate a random split.

import numpy as np

def split(s):
    reps = (~s.filter(like='group_').isna()).astype(int).add_suffix('_calls')
    total = reps.sum()
    if total > 2:   # remove this line and below for a better split across reps
        return np.random.multinomial(s['total_calls_req'], [1/total]*total)
    div, mod = divmod(int(s['total_calls_req']), total)
    reps = reps*div # split evenly
    reps[np.random.choice(np.flatnonzero(reps), mod)]+=1 # allocate remainder randomly
    return reps

pd.concat([df, df.apply(split, axis=1)], axis=1)

output:

   cust_id  total_calls_req  group_1_rep  group_2_rep  group_3_rep  group_1_rep_calls  group_2_rep_calls  group_3_rep_calls
0    34523               10       230429          NaN     583985.0                  5                  0                  5
1    34583               12       230429     539409.0     583985.0                  4                  2                  6
2    34455                6       135552          NaN          NaN                  6                  0                  0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks for this. How can i split odd values so i get whole numbers. Also I changed the astype to string to help capture any nulls. – dsexplorer Jul 19 '21 at 22:20
  • I updated the code for that. You could simplify it by removing the `if total > 2:` line and the one after. It would result in fair splits in all cases (even if reps > 2) – mozway Jul 20 '21 at 07:32
0

You can build a function which returns a serie with three elements according to the number of NaN values. I have based in this answer to get the Series, in that answer is used numpy.random.multinomial.

import numpy as np    

def serie_split(row):
  total_calls_req = row[0]
  groups = row[1:]
  numbers_nan = pd.notna(groups).sum()
  if numbers_nan == len(groups):
    s = pd.Series(np.random.multinomial(total_calls_req, [1/len(groups)] * len(groups)))
  else:
    s = pd.Series(groups)
    s.loc[s.notna()] = total_calls_req/numbers_nan if (total_calls_req % 2) == 0 else np.random.multinomial(total_calls_req, [1/numbers_nan] * numbers_nan)
    s.loc[s.isna()] = 0
  return s

def get_rep_calls(df):
  columns = df.filter(like='group_').add_suffix('_calls').columns
  dfg = df[df.columns[1:]] # dfg is a dataframe only with the columns 'total_calls_req','group_1_rep', 'group_2_rep' and 'group_3_rep'
  series = [serie_split(row) for row in dfg.to_numpy(dtype='object')]
  for index in range(len(dfg)):
    df.loc[index, columns] = series[index].values

get_rep_calls(df)
print(df)

Output (I have added an example in the last row with total_calls_req = 13):

cust_id total_calls_req group_1_rep group_2_rep group_3_rep group_1_rep_calls group_2_rep_calls group_3_rep_calls
34523 10 230429 NaN 583985 5 0 5
34583 12 230429 539409 583985 4 7 1
34455 6 135552 NaN NaN 6 0 0
12345 13 123456 NaN 583985 10 0 3
Carmoreno
  • 1,271
  • 17
  • 29