2

Reproducible tibble: I have a database similar to the one shown below. The difference being that the database I'm working with is much bigger.

general_tibble <- tibble(gender = c("female", "female", "male"),
                             age = c(18, 19,18),
                             age_partner = c(22,20,17),
                             max_age = c(60, 60, 65), 
                             nrs =c(42,41,47))

general_tibble results in :

  gender age age_partner max_age nrs
1 female  18          22      60  42
2 female  19          20      60  41
3   male  18          17      65  47

Question: How do I create a new table from a previous table, that takes the value of nrs, and creates a column variable, called n, that goes from 0 to the value in nrs?

To illustrate further, in row 1 of general_tibble the column nrs is equal to 42 so the column would go from 0 to 42, in row 2 nrsis equal to 41 so the column would go from 0 to 41, and the same for row 3.

I am currently using the code below. It works, but when general_tibble is too big, the code performs really slow.

general_list <- list()

for(i in 1:NROW(general_tibble)){
  general_list[[i]] <- data.frame(general_tibble[i, ], 
                             n = 0:general_tibble[[i, "nrs"]])
} 

Then I bind_rows() general_list to obtain general_binded

general_binded <- bind_rows(general_list)

general_binded[c(1:5, 38:42),] results in :

   gender age age_partner max_age nrs  n
1  female  18          22      60  42  0
2  female  18          22      60  42  1
3  female  18          22      60  42  2
4  female  18          22      60  42  3
5  female  18          22      60  42  4
38 female  18          22      60  42 37
39 female  18          22      60  42 38
40 female  18          22      60  42 39
41 female  18          22      60  42 40
42 female  18          22      60  42 41

PS: In the for loop I use data.frame() instead of tibble() because i want to recycle the rows. If you have some sort of advice involving tibbles or dataframes, ill gladly take it.

M--
  • 25,431
  • 8
  • 61
  • 93
Dennis Aguilar
  • 113
  • 1
  • 7
  • Seems related [Faster equivalent to group_by %>% expand in R](https://stackoverflow.com/questions/55103276/faster-equivalent-to-group-by-expand-in-r) – markus May 21 '19 at 19:22
  • https://stackoverflow.com/questions/2894775/replicate-each-row-of-data-frame-and-specify-the-number-of-replications-for-each – M-- May 21 '19 at 20:03

5 Answers5

5

Easiest approach towards this would be expanding the general_tibble on nrs column using tidyr::expand() function:

library(tidyverse)

general_tibble %>% 
        group_by_all()%>% 
        expand(n = 0:nrs)

#> # A tibble: 133 x 6
#> # Groups:   gender, age, age_partner, max_age, nrs [3]
#>    gender   age age_partner max_age   nrs     n
#>    <chr>  <dbl>       <dbl>   <dbl> <dbl> <int>
#>  1 female    18          22      60    42     0
#>  2 female    18          22      60    42     1
#>  3 female    18          22      60    42     2
#>  4 female    18          22      60    42     3
#>  5 female    18          22      60    42     4
#>  6 female    18          22      60    42     5
#>  7 female    18          22      60    42     6
#>  8 female    18          22      60    42     7
#>  9 female    18          22      60    42     8
#> 10 female    18          22      60    42     9
#> # ... with 123 more rows

Created on 2019-05-21 by the reprex package (v0.2.1)


Just another idea using only base R functions:

expanded_vars <- do.call(rbind,lapply(general_tibble$nrs, 
                                              function(x) expand.grid(x, 0:x)))
names(expanded_vars) <- c("nrs", "n")

merge(y = expanded_vars, x = general_tibble, by = "nrs", all = TRUE)
M--
  • 25,431
  • 8
  • 61
  • 93
  • Hi, the `group_by_all() %>% expand()` worked great. However, I'm now facing the problem that it takes around 15 minutes to expand a tibble of about 600,000 rows. What do you suggest? – Dennis Aguilar Jun 30 '19 at 16:04
  • @DennisAguilar give Toucan's data.table solution a shot, since data.table is generally faster. Another idea at the moment is splitting after grouping, expanding and then rowbinding (not sure if that would help tho). – M-- Jun 30 '19 at 16:07
4

One nice thing about using data.table vs tidyverse is that you don't need to think of operations in terms of whether what you're doing is a mutate, expand, or summarize. You can just put what you want in the j part of df[i, j, k] and however many rows that resolves to, that's what you get.

library(data.table)
setDT(general_tibble)

general_tibble[, .(n = seq(0, nrs))
               , by = names(general_tibble)]


#      gender age age_partner max_age nrs  n
#   1: female  18          22      60  42  0
#   2: female  18          22      60  42  1
#   3: female  18          22      60  42  2
#   4: female  18          22      60  42  3
#   5: female  18          22      60  42  4
#  ---                                      
# 129:   male  18          17      65  47 43
# 130:   male  18          17      65  47 44
# 131:   male  18          17      65  47 45
# 132:   male  18          17      65  47 46
# 133:   male  18          17      65  47 47
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
2

We may use uncount

library(tidyverse)
general_tibble %>% 
   mutate(grp = row_number(), nrsN = nrs + 1) %>% 
   uncount(nrsN) %>%
   group_by(grp) %>% 
   mutate(n = row_number() - 1) %>%
   ungroup %>%
   select(-grp)
# A tibble: 133 x 6
#   gender   age age_partner max_age   nrs     n
#   <chr>  <dbl>       <dbl>   <dbl> <dbl> <dbl>
# 1 female    18          22      60    42     0
# 2 female    18          22      60    42     1
# 3 female    18          22      60    42     2
# 4 female    18          22      60    42     3
# 5 female    18          22      60    42     4
# 6 female    18          22      60    42     5
# 7 female    18          22      60    42     6
# 8 female    18          22      60    42     7
# 9 female    18          22      60    42     8
#10 female    18          22      60    42     9
# … with 123 more rows

Another option is unnest

general_tibble %>% 
   mutate(n = map(nrs+1, ~  seq(.x) - 1)) %>%
   unnest
akrun
  • 874,273
  • 37
  • 540
  • 662
1

One way with base R (minus the tibble package).

First, split by the nrs group. Second, expand the rows of each dataframe by the nrs value. Third, create an id column that represents 0:whatever number of rows. Fourth, bring it back to a tibble:

library(tibble)

df <- tibble(
  gender      = c("female", "female", "male"),
  age         = c(18, 19, 18),
  age_partner = c(22, 20, 17),
  max_age     = c(60, 60, 65), 
  nrs         = c(42, 41, 47)
  )

nrs_split <- split(df, df$nrs)
df_list <- lapply(nrs_split, function(i) i[rep(seq_len(nrow(i)), each=i$nrs + 1), ])
df_renum <- lapply(df_list, function(i) {i$id <- 0:rle(i$nrs)$values; return(i)})
df <- do.call("rbind", df_renum)
df
#> # A tibble: 133 x 6
#>    gender   age age_partner max_age   nrs    id
#>  * <chr>  <dbl>       <dbl>   <dbl> <dbl> <int>
#>  1 female    19          20      60    41     0
#>  2 female    19          20      60    41     1
#>  3 female    19          20      60    41     2
#>  4 female    19          20      60    41     3
#>  5 female    19          20      60    41     4
#>  6 female    19          20      60    41     5
#>  7 female    19          20      60    41     6
#>  8 female    19          20      60    41     7
#>  9 female    19          20      60    41     8
#> 10 female    19          20      60    41     9
#> # … with 123 more rows
tyluRp
  • 4,678
  • 2
  • 17
  • 36
1

With dplyr and tidyr, you can also do:

general_tibble %>%
 group_by(rowid = row_number()) %>%
 mutate(n = nrs) %>%
 complete(n = seq(0, n, 1)) %>%
 fill(everything(), .direction = "up") %>%
 ungroup() %>%
 select(-rowid)

       n gender   age age_partner max_age   nrs
   <dbl> <chr>  <dbl>       <dbl>   <dbl> <dbl>
 1     0 female    18          22      60    42
 2     1 female    18          22      60    42
 3     2 female    18          22      60    42
 4     3 female    18          22      60    42
 5     4 female    18          22      60    42
 6     5 female    18          22      60    42
 7     6 female    18          22      60    42
 8     7 female    18          22      60    42
 9     8 female    18          22      60    42
10     9 female    18          22      60    42
tmfmnk
  • 38,881
  • 4
  • 47
  • 67