1

I am wondering if there is any tidyverse function that can make a dataset longer by applying a value in one column. Easier to explain if I provide an example. Evetually I want to use this for turnbing a person-level survival dataset to a person period, but for now I just want to know this simple thing.

Here is the data. We have an id variable (id), a time invariant predictor (sex), and a variable that tells us how many observation points the participant in question was observed for.

df <- data.frame(id = 1:3, 
                 sex = factor(c("m", "f", "f")),
                 years = c(4,5,3))
df

#   id sex years
# 1  1   m     4
# 2  2   f     5
# 3  3   f     3

Now I want to widen it so the number of rows for each participant corresponds to the number in the years column, so 4 for participant 1, 5 for participant 2, and 3 for participant 3.

So I would want it to look like this

df2 <- data.frame(id = c(rep(1,4), rep(2,5), rep(3,3)),
                  rep = rep(c("m", "f", "f"),c(4,5,3)))

df2

#    id rep
# 1   1   m
# 2   1   m
# 3   1   m
# 4   1   m
# 5   2   f
# 6   2   f
# 7   2   f
# 8   2   f
# 9   2   f
# 10  3   f
# 11  3   f
# 12  3   f

Is there a tidyverse function that can do this for me? (perhaps pivot_longer?)

llewmills
  • 2,959
  • 3
  • 31
  • 58

1 Answers1

1

Instead of pivot_longer, we can do this easily with uncount

library(tidyr)
library(dplyr)
df %>% 
   uncount(years)

-output

#    id sex
#1   1   m
#2   1   m
#3   1   m
#4   1   m
#5   2   f
#6   2   f
#7   2   f
#8   2   f
#9   2   f
#10  3   f
#11  3   f
#12  3   f

Or using base R (R 4.1.0)

df$years |>
     {\(x) rep(seq_along(x), x)}() |> 
     {\(i) `[`(df, i, c('id', 'sex'))}() |>
     `row.names<-`(NULL)

-output

#    id sex
#1   1   m
#2   1   m
#3   1   m
#4   1   m
#5   2   f
#6   2   f
#7   2   f
#8   2   f
#9   2   f
#10  3   f
#11  3   f
#12  3   f
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Goodness gracious @akrun I understood none of the base R code. What does `|>` do for instance? Coming on this site is very humbling, in a good way. – llewmills May 24 '21 at 01:44
  • 1
    @llewmills it is similar to the `%>%` operator in `tidyverse` except the that we cannot use the `.` as the data before the chain operator. Here, we could use lambda function (`function(x) x..` or in the new version, it can be `\(x) x`. Basically, it is extracting the 'years' column, then replicate it by its sequence, use the `[`. to subset the rows of df by row/column index/names and setting the rownames to NULL – akrun May 24 '21 at 01:47
  • 1
    @llewmills we could make it simpler as `df[rep(seq_len(nrow(df)), df$years, c('id', 'sex')]` but I thought to add this as something new – akrun May 24 '21 at 01:47