1

I'm trying to create hypothetical data for a predictive analysis. I have a set of averages I plan to apply my predictive model to. However, I want to expand my DF with various values in the Agent_Count column. Is there a way to copy columns 2-4 to new rows while changing the value of column 1, in those new rows, to a new hypothetical value. I would like to do this say 20-30 times.

Here is my data.

Group             Agent_Count HOUR Answered_Calls Aban Total_Calls
Clinical Support    11.75      9    52.69         2.77  56.65
PW Reset            12.06      9    53.79         22.27 81.98
Technical Support   21.15      9    81.02         2.22  84.20
  • 1
    Could you provide an example of what you want your desired output would look like if it were copied 2-3 times? I'm a little unclear how the other values should be copied – jpsmith Mar 21 '23 at 17:28
  • Sure Group Agent_Count HOUR Answered_Calls Aban Total_Calls Clinical Support 11.75 9 52.69 2.77 56.65 PW Reset 12.06 9 53.79 22.27 81.98 Technical Support 21.15 9 81.02 2.22 84.20 Clinical Support 12.75 9 52.69 2.77 56.65 PW Reset 13.06 9 53.79 22.27 81.98 Technical Support 22.15 9 81.02 2.22 84.20 – PeatyBoWeaty Mar 21 '23 at 17:33
  • Sorry, I didn't expect that to come out so ugly. Basically every column value should stay the same, except agent count should increase by 1 in each subsequent new row. – PeatyBoWeaty Mar 21 '23 at 17:34
  • @PeatyBoWeaty FYI, you can always update your questions through edit. In fact thats the recommended way how to update questions. – Andre Wildberg Mar 21 '23 at 17:50

5 Answers5

3

You could do this as a one-liner with

do.call(rbind, lapply(0:9, \(x) within(df, Agent_Count <- x + Agent_Count)))
#>                Group Agent_Count HOUR Answered_Calls  Aban Total_Calls
#> 1   Clinical Support       11.75    9          52.69  2.77       56.65
#> 2           PW Reset       12.06    9          53.79 22.27       81.98
#> 3  Technical Support       21.15    9          81.02  2.22       84.20
#> 11  Clinical Support       12.75    9          52.69  2.77       56.65
#> 21          PW Reset       13.06    9          53.79 22.27       81.98
#> 31 Technical Support       22.15    9          81.02  2.22       84.20
#> 12  Clinical Support       13.75    9          52.69  2.77       56.65
#> 22          PW Reset       14.06    9          53.79 22.27       81.98
#> 32 Technical Support       23.15    9          81.02  2.22       84.20
#> 13  Clinical Support       14.75    9          52.69  2.77       56.65
#> 23          PW Reset       15.06    9          53.79 22.27       81.98
#> 33 Technical Support       24.15    9          81.02  2.22       84.20
#> 14  Clinical Support       15.75    9          52.69  2.77       56.65
#> 24          PW Reset       16.06    9          53.79 22.27       81.98
#> 34 Technical Support       25.15    9          81.02  2.22       84.20
#> 15  Clinical Support       16.75    9          52.69  2.77       56.65
#> 25          PW Reset       17.06    9          53.79 22.27       81.98
#> 35 Technical Support       26.15    9          81.02  2.22       84.20
#> 16  Clinical Support       17.75    9          52.69  2.77       56.65
#> 26          PW Reset       18.06    9          53.79 22.27       81.98
#> 36 Technical Support       27.15    9          81.02  2.22       84.20
#> 17  Clinical Support       18.75    9          52.69  2.77       56.65
#> 27          PW Reset       19.06    9          53.79 22.27       81.98
#> 37 Technical Support       28.15    9          81.02  2.22       84.20
#> 18  Clinical Support       19.75    9          52.69  2.77       56.65
#> 28          PW Reset       20.06    9          53.79 22.27       81.98
#> 38 Technical Support       29.15    9          81.02  2.22       84.20
#> 19  Clinical Support       20.75    9          52.69  2.77       56.65
#> 29          PW Reset       21.06    9          53.79 22.27       81.98
#> 39 Technical Support       30.15    9          81.02  2.22       84.20

Created on 2023-03-21 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
1

An option with dplyr/purrr:

library(dplyr)
library(purrr)

df <- tibble::tribble(
  ~Group, ~Agent_Count, ~HOUR, ~Answered_Calls, ~Aban, ~Total_Calls,
  "Clinical Support",        11.75,    9L,           52.69,  2.77,        56.65,
  "PW Reset",        12.06,    9L,           53.79, 22.27,        81.98,
  "Technical Support",        21.15,    9L,           81.02,  2.22,         84.2
)

  
map_df(0:30, function(x) {df %>% mutate(across(Agent_Count:Answered_Calls, ~. + x))})

#> # A tibble: 93 × 6
#>    Group             Agent_Count  HOUR Answered_Calls  Aban Total_Calls
#>    <chr>                   <dbl> <int>          <dbl> <dbl>       <dbl>
#>  1 Clinical Support         11.8     9           52.7  2.77        56.6
#>  2 PW Reset                 12.1     9           53.8 22.3         82.0
#>  3 Technical Support        21.2     9           81.0  2.22        84.2
#>  4 Clinical Support         12.8    10           53.7  2.77        56.6
#>  5 PW Reset                 13.1    10           54.8 22.3         82.0
#>  6 Technical Support        22.2    10           82.0  2.22        84.2
#>  7 Clinical Support         13.8    11           54.7  2.77        56.6
#>  8 PW Reset                 14.1    11           55.8 22.3         82.0
#>  9 Technical Support        23.2    11           83.0  2.22        84.2
#> 10 Clinical Support         14.8    12           55.7  2.77        56.6
#> # … with 83 more rows

Or, if just needing Agent_Count to increase:

map_df(0:30, function(x) {df %>% mutate(Agent_Count = Agent_Count + x)})

#> # A tibble: 93 × 6
#>    Group             Agent_Count  HOUR Answered_Calls  Aban Total_Calls
#>    <chr>                   <dbl> <int>          <dbl> <dbl>       <dbl>
#>  1 Clinical Support         11.8     9           52.7  2.77        56.6
#>  2 PW Reset                 12.1     9           53.8 22.3         82.0
#>  3 Technical Support        21.2     9           81.0  2.22        84.2
#>  4 Clinical Support         12.8     9           52.7  2.77        56.6
#>  5 PW Reset                 13.1     9           53.8 22.3         82.0
#>  6 Technical Support        22.2     9           81.0  2.22        84.2
#>  7 Clinical Support         13.8     9           52.7  2.77        56.6
#>  8 PW Reset                 14.1     9           53.8 22.3         82.0
#>  9 Technical Support        23.2     9           81.0  2.22        84.2
#> 10 Clinical Support         14.8     9           52.7  2.77        56.6
#> # … with 83 more rows
Matt
  • 7,255
  • 2
  • 12
  • 34
1

Here is a tidyverse approach:

library(purrr)
library(dplyr)

map_df(0:9, ~mutate(df, Agent_Count = rnorm(n(), 12, 2))) 
           Group Agent_Count HOUR Answered_Calls  Aban Total_Calls
1   Clinical Support   13.268501    9          52.69  2.77       56.65
2           PW Reset   12.847290    9          53.79 22.27       81.98
3  Technical Support   11.596324    9          81.02  2.22       84.20
4   Clinical Support   11.846268    9          52.69  2.77       56.65
5           PW Reset   13.374728    9          53.79 22.27       81.98
6  Technical Support   12.343263    9          81.02  2.22       84.20
7   Clinical Support   10.339783    9          52.69  2.77       56.65
8           PW Reset   11.419682    9          53.79 22.27       81.98
9  Technical Support    9.361749    9          81.02  2.22       84.20
10  Clinical Support   10.065936    9          52.69  2.77       56.65
11          PW Reset   11.710778    9          53.79 22.27       81.98
12 Technical Support    8.403735    9          81.02  2.22       84.20
13  Clinical Support    8.622915    9          52.69  2.77       56.65
14          PW Reset   14.205130    9          53.79 22.27       81.98
15 Technical Support   10.846762    9          81.02  2.22       84.20
16  Clinical Support    8.296617    9          52.69  2.77       56.65
17          PW Reset   11.774274    9          53.79 22.27       81.98
18 Technical Support   14.642139    9          81.02  2.22       84.20
19  Clinical Support   13.324509    9          52.69  2.77       56.65
20          PW Reset   12.882766    9          53.79 22.27       81.98
21 Technical Support   14.367492    9          81.02  2.22       84.20
22  Clinical Support   10.456997    9          52.69  2.77       56.65
23          PW Reset   13.459378    9          53.79 22.27       81.98
24 Technical Support   10.825829    9          81.02  2.22       84.20
25  Clinical Support   12.001528    9          52.69  2.77       56.65
26          PW Reset   16.428931    9          53.79 22.27       81.98
27 Technical Support   13.938869    9          81.02  2.22       84.20
28  Clinical Support   13.536015    9          52.69  2.77       56.65
29          PW Reset    9.783344    9          53.79 22.27       81.98
TarJae
  • 72,363
  • 6
  • 19
  • 66
0

In base R, you can "extend" the same data frame n times using the repeat() function:

reps <- 30 # repeat it 30 times
df_big <- do.call(rbind, replicate(reps, df, simplify = FALSE))

You can then replace new rows with new values. For instance, to replace rows 4 and onwards with +1 the previous value:

for(i in seq_len(nrow(df_big))){
  if(!(i %in% 1:3)){
  df_big[i,"Agent_Count"] <- df_big[i-1,"Agent_Count"]+1
}}

              Group Agent_Count HOUR Answered_Calls  Aban Total_Calls
Clinical    Support       11.75    9          52.69  2.77       56.65
PW            Reset       12.06    9          53.79 22.27       81.98
Technical   Support       21.15    9          81.02  2.22       84.20
Clinical1   Support       22.15    9          52.69  2.77       56.65
PW1           Reset       23.15    9          53.79 22.27       81.98
Technical1  Support       24.15    9          81.02  2.22       84.20
Clinical2   Support       25.15    9          52.69  2.77       56.65
PW2           Reset       26.15    9          53.79 22.27       81.98
Technical2  Support       27.15    9          81.02  2.22       84.20
Clinical3   Support       28.15    9          52.69  2.77       56.65
......

Or to replace the values with a random number (here, generated form a normal distribution with mean = 11 and sd = 5):

df_big[-c(1:3), "Agent_Count"] <- rnorm(reps*3-3, mean = 11, sd = 5)

              Group Agent_Count HOUR Answered_Calls  Aban Total_Calls
Clinical    Support  11.7500000    9          52.69  2.77       56.65
PW            Reset  12.0600000    9          53.79 22.27       81.98
Technical   Support  21.1500000    9          81.02  2.22       84.20
Clinical1   Support   6.9110877    9          52.69  2.77       56.65
PW1           Reset  10.7298854    9          53.79 22.27       81.98
Technical1  Support  12.6507080    9          81.02  2.22       84.20
Clinical2   Support  15.7766231    9          52.69  2.77       56.65
......
jpsmith
  • 11,023
  • 5
  • 15
  • 36
0

You could try a for loop with samples... This is all I can do without more info

OUTPUT<-NULL
for (i in 1:10){#repeat 30 times
  for(j in c("Clinical Support","PW Reset","Technical Support")){# repeat once for each group
    for (k in 8:18){#repeat for each hour
      Total_Calls <- sample(20:60,1) #sample total calls
      Answered_Calls <-sample(20:length(Total_Calls),1)#sample nswered calls
      Aban <- Total_Calls - Answered_Calls
      OUTPUT <- rbind(
        OUTPUT,
        data.frame(
          DAY = i,
          Group=j,
          Agent_Count=sample(2:10,1),
          HOUR = k,
          Answered_Calls=Answered_Calls,
          Aban=Aban,
          Total_Calls=Total_Calls
        )
      )
    }
  }
}
print(OUTPUT)