creating a function that takes into multiple dataframe columns and spilt value across the columns

Question

I have a sales dataframe with customer data and sales team. I have a target calls based on the individual customer which I want to split across the sales teams

 cust_id| total_calls_req| group_1_rep| group_2_rep| group_3_rep
 34523  |    10          | 230429     | nan        | 583985
 34583  |    12          | 230429     | 539409     | 583985
 34455  |    6           | 135552     | nan        | nan

I want to create a function that splits the total_calls_req across each group based on whether or not there is a group_rep assigned.

If cust_id is assigned to 1 rep then the total_calls_req is all assigned to the rep in question

If cust_id is assigned to 2 reps then the total_calls_req is split between the two reps in question.

If cust_id is assigned to 3 reps then the total_calls_req is split randomly between the three reps in question and needs to be whole cards.

I want the end dataframe to look like this:

 cust_id| total_calls_req| group_1_rep| group_2_rep| group_3_rep| group_1_rep_calls| group_2_rep_calls| group_3_rep_calls
 34523  |    10          | 230429     | nan        | 583985     |   5              | 0                |   5
 34583  |    12          | 230429     | 539409     | 583985     |   6              | 3                |   3
 34455  |    6           | 135552     | nan        | nan        |   6              | 0                |   0

Is there a way I can do that through a python function?

So the case with 3 reps and the random split means any combination of calls that add up to 12 (in the example)? You show 6-3-3. Could it be 10-1-1 or 0-4-8, etc? And what if the 2-rep case has an odd number of calls? — jch, Jul 19 '21 at 18:14
@jch Yes it can be any random combination. If it is odd then you want split to either but it needs to be whole numbers adding up to the total — dsexplorer, Jul 19 '21 at 19:48
@dsexplorer any feedback on the answers? Has one of them solved your problem? — mozway, Jul 25 '21 at 11:40

mozway · Answer 1 · 2021-07-20T12:40:34.583

You can use this custom split function to split the calls among reps that are assigned to the customer. This uses the columns starting in "group_" to identify the assigned reps and count how many they are. In case they are more than two, the numpy.random.multinomial function enables to generate a random split.

import numpy as np

def split(s):
    reps = (~s.filter(like='group_').isna()).astype(int).add_suffix('_calls')
    total = reps.sum()
    if total > 2:   # remove this line and below for a better split across reps
        return np.random.multinomial(s['total_calls_req'], [1/total]*total)
    div, mod = divmod(int(s['total_calls_req']), total)
    reps = reps*div # split evenly
    reps[np.random.choice(np.flatnonzero(reps), mod)]+=1 # allocate remainder randomly
    return reps

pd.concat([df, df.apply(split, axis=1)], axis=1)

output:

   cust_id  total_calls_req  group_1_rep  group_2_rep  group_3_rep  group_1_rep_calls  group_2_rep_calls  group_3_rep_calls
0    34523               10       230429          NaN     583985.0                  5                  0                  5
1    34583               12       230429     539409.0     583985.0                  4                  2                  6
2    34455                6       135552          NaN          NaN                  6                  0                  0

Thanks for this. How can i split odd values so i get whole numbers. Also I changed the astype to string to help capture any nulls. — dsexplorer, Jul 19 '21 at 22:20
I updated the code for that. You could simplify it by removing the `if total > 2:` line and the one after. It would result in fair splits in all cases (even if reps > 2) — mozway, Jul 20 '21 at 07:32

Carmoreno · Accepted Answer · 2021-07-22T01:40:33.760

You can build a function which returns a serie with three elements according to the number of NaN values. I have based in this answer to get the Series, in that answer is used numpy.random.multinomial.

import numpy as np    

def serie_split(row):
  total_calls_req = row[0]
  groups = row[1:]
  numbers_nan = pd.notna(groups).sum()
  if numbers_nan == len(groups):
    s = pd.Series(np.random.multinomial(total_calls_req, [1/len(groups)] * len(groups)))
  else:
    s = pd.Series(groups)
    s.loc[s.notna()] = total_calls_req/numbers_nan if (total_calls_req % 2) == 0 else np.random.multinomial(total_calls_req, [1/numbers_nan] * numbers_nan)
    s.loc[s.isna()] = 0
  return s

def get_rep_calls(df):
  columns = df.filter(like='group_').add_suffix('_calls').columns
  dfg = df[df.columns[1:]] # dfg is a dataframe only with the columns 'total_calls_req','group_1_rep', 'group_2_rep' and 'group_3_rep'
  series = [serie_split(row) for row in dfg.to_numpy(dtype='object')]
  for index in range(len(dfg)):
    df.loc[index, columns] = series[index].values

get_rep_calls(df)
print(df)

Output (I have added an example in the last row with total_calls_req = 13):

cust_id	total_calls_req	group_1_rep	group_2_rep	group_3_rep	group_1_rep_calls	group_2_rep_calls	group_3_rep_calls
34523	10	230429	NaN	583985	5	0	5
34583	12	230429	539409	583985	4	7	1
34455	6	135552	NaN	NaN	6	0	0
12345	13	123456	NaN	583985	10	0	3

Thanks for this. How can i split odd values so i get whole numbers and still add to the total? Is there something i can add to this line `s.loc[s.isna()] = total_calls_req / numbers_nan` — dsexplorer, Jul 19 '21 at 22:29
You could do something with `np.random.multinomial`. Give me one second.. — Carmoreno, Jul 19 '21 at 22:45

creating a function that takes into multiple dataframe columns and spilt value across the columns

2 Answers2

Linked