0

I have a dataframe with these 3 columns,I want to create random values for each row. The random value for each row should be between the start and stop and the number of random values should be equal to the value of column:times of the dataframe.

> Example:
start   end     times
82716683 82730328  1 
11106535 11262507  3 
Ouput:
  start   end     times  random
82716683 82730328  1     82716965
11106535 11262507  3     11069855
11106535 11262507  3     11115562
11106535 11262507  3     11185696

I am new to R, I am trying to do it via loops but the main file is really big. So, I am looking for a better solution. Please let me know if you ahve any idea.

  • 2
    Not sure it makes sense to close this. It's not an exact copy of [this answer](https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column) since it also needs to generate a random number per observation. – geoff Jan 12 '22 at 14:27
  • In any case, here is a `tidyverse` and `purrr` solution: ``` library(tidyverse) library(purrr) df <- tibble(start = c(82716683, 11106535), end = c(82730328, 11262507), times = c(1,3)) df |> mutate(times = map(times, ~rep(., .))) |> unnest_longer(times) |> mutate(random = map2_dbl(start, end, ~sample(..1:..2, 1))) ``` – geoff Jan 12 '22 at 14:27
  • But the main file, is really big, with over million records, the solution you mentioned will require a loop over the dataframe. Apply won't be useful here? – Nikita Srivastav Jan 12 '22 at 14:33
  • There's nothing manual here I just wrote out the DF. `map` does basically the same thing as `apply`, neither are magic and the runtime will still be slower with millions of records but both offer speedups over for loops. Another `map` way to do it would be `pmap(list(df$start, df$end, df$times), ~round(runif(..3, ..1, ..2)))` – geoff Jan 12 '22 at 14:47
  • Thankyou so much! I guess things gonna work out well for me with pmap. Please if you will, can you just briefly let me know, what is ~round(runif(..3, ..1, ..2))) doing?, the values 3,2,1 in particular? Thankyou in advance – Nikita Srivastav Jan 12 '22 at 15:10
  • No worries. `purrr` syntax is extremely confusing when you first start so don't feel bad if takes you a while. Whenever you use `map`, you first have to think about how many inputs you have. `map` is for 1 input,`map2` for 2, and `pmap` for an arbitrary number. When you use `pmap`, you must provide a list. Then, to indicate each of the elements in that list, you use `..1`, `..2`, etc, in the order that they are in the list. `..1` is the first element (`df$start`) , `..2` is the second (`df$end`), and so forth. – geoff Jan 12 '22 at 15:14
  • Note that for the function argument in `map` functions, you have two options. I used the shorthand with ~ where you specify arguments that way, but this chunk of code is identical: `pmap(list(df$start, df$end, df$times), .f = function(start, end, n){ round(runif(n, start, end)) })`. If you prefer readability over conciseness, you have this option. – geoff Jan 12 '22 at 15:16
  • 1
    You are such a blessing!Thankyou so much! Great Great help!! Really appreciate it. – Nikita Srivastav Jan 12 '22 at 15:16

0 Answers0