Reshaping data using tidyr

Question

I am working with a dataframe data which is similar in structure to the one below.

  Gender   Age         Number
1 Female 55-59 years       5
2 Female   65+ years       10
3   Male 25-29 years       4
4   Male 40-44 years       3
5   Male 50-54 years       1

I am attempting to reshape the data (unsuccessfully thus far) using tidyr so that each value of the Number column is featured on its own line. The output I am seeking should resemble the following:

  Gender   Age
1 Female 55-59 years  
2 Female 55-59 years
3 Female 55-59 years
4 Female 55-59 years
5 Female 55-59 years 
6 Female   65+ years
7 Female   65+ years
8 Female   65+ years
9 Female   65+ years
10 Female   65+ years
11 Female   65+ years
12 Female   65+ years
13 Female   65+ years
14 Female   65+ years
15 Female   65+ years
16 Male 25-29 years
17 Male 25-29 years
18 Male 25-29 years
19 Male 25-29 years
20 Male 40-44 years
21 Male 40-44 years
22 Male 40-44 years
23 Male 50-54 years

I have tried to use various combinations of the gather/spread functions without coming even remotely close to success. I'm fairly sure this is possible in tidyr!

I know there are a number of other packages/functions that I could use to achieve the same result, but I'm quite keen to get a tidyr solution so I can include it in a larger dplyr/tidyr pipe.

Any help of assistance would be very much appreciated.

dat <- structure(list(Gender = structure(c(3L, 3L, 1L, 2L, 1L), .Label = c("   Male", 
    " Male", "Female"), class = "factor"), Age = structure(c(5L, 
    1L, 2L, 3L, 4L), .Label = c("65+ years", "25-29 years", "40-44 years", 
    "50-54 years", "55-59 years"), class = "factor"), Number = c(5L, 
    10L, 4L, 3L, 1L)), .Names = c("Gender", "Age", "Number"), class = "data.frame", row.names = c(NA, 
    -5L))

Why not just use `rep()`? You can easily do `with(df, data.frame(Gender = rep(Gender, Number), Age = rep(Age, Number)))` — Rich Scriven, Oct 23 '15 at 01:22
With base R: http://stackoverflow.com/questions/2894775/replicate-each-row-of-data-frame-and-specify-the-number-of-replications-for-each — , Oct 23 '15 at 01:48
Or just `library(splitstackshape) ; expandRows(df, "Number")` — David Arenburg, Oct 23 '15 at 06:43

Frank · Accepted Answer · 2015-10-23T02:08:08.397

5

This is also not using tidyr, but I think it's natural:

dat %>% slice(rep(row_number(), Number)) %>% select(-Number)

    Gender         Age
1   Female 55-59 years
2   Female 55-59 years
3   Female 55-59 years
4   Female 55-59 years
5   Female 55-59 years
6   Female   65+ years
7   Female   65+ years
8   Female   65+ years
9   Female   65+ years
10  Female   65+ years
11  Female   65+ years
12  Female   65+ years
13  Female   65+ years
14  Female   65+ years
15  Female   65+ years
16    Male 25-29 years
17    Male 25-29 years
18    Male 25-29 years
19    Male 25-29 years
20    Male 40-44 years
21    Male 40-44 years
22    Male 40-44 years
23    Male 50-54 years

As @bramtayl suggested, one can (arguably) improve readability with

dat %>% slice(row_number() %>% rep(Number)) %>% select(-Number)

edited Oct 23 '15 at 02:08

answered Oct 23 '15 at 01:58

Frank

66,179
8
96
180

1

Not **tidyr** but within the hadleyverse framework. Nicely done +1 – Tyler Rinker Oct 23 '15 at 02:05
3

or `dat %>% slice(n() %>% seq %>% rep(Number))` – bramtayl Oct 23 '15 at 02:06
@bramtayl I just realized `n() %>% seq` is the same as `row_number()` thanks to your comment. I've added your way with that change and edited mine as well. – Frank Oct 23 '15 at 02:11
Nice approach there ! I had `df %>% do(data_frame(Gender = rep(.$Gender, .$Number), Age = rep(.$Age, .$Number)))`. – Steven Beaupré Oct 23 '15 at 11:05

score 4 · Answer 2 · answered Oct 23 '15 at 01:24

Not tidyr but pretty fast and efficient:

dat2 <- dat[rep(1:nrow(dat), dat[["Number"]]), 1:2]
rownames(dat2) <- NULL

##     Gender          Age
## 1   Female  55-59 years
## 2   Female  55-59 years
## 3   Female  55-59 years
## 4   Female  55-59 years
## 5   Female  55-59 years
## 6   Female    65+ years
## 7   Female    65+ years
## 8   Female    65+ years
## 9   Female    65+ years
## 10  Female    65+ years
## 11  Female    65+ years
## 12  Female    65+ years
## 13  Female    65+ years
## 14  Female    65+ years
## 15  Female    65+ years
## 16    Male  25-29 years
## 17    Male  25-29 years
## 18    Male  25-29 years
## 19    Male  25-29 years
## 20    Male  40-44 years
## 21    Male  40-44 years
## 22    Male  40-44 years
## 23    Male  50-54 years

Thanks @TylerRinker - its a nice tidy solution. I'm really keen to see if anyone can find a tidyr solution though. I'm trying to get a better understanding of the syntax and what is possible/not possible with it. I thought others might find that useful too... — vengefulsealion, Oct 23 '15 at 01:44
@vengefulsealion - I don't think there are any functions in tidyr that replicate rows based on the value in a column — Rich Scriven, Oct 23 '15 at 01:47

akrun · Answer 3 · 2015-10-23T07:16:45.193

We could do this using tidyr/dplyr. Convert the 'Number' to a list column after changing the values to sequence, unnest and remove the 'Number' column from the output with select.

library(dplyr)
library(tidyr)
dat1 <- dat %>% 
          mutate(Number= lapply(Number, seq)) %>%
          unnest(Number) %>% 
          select(-Number)

Note that the output will be a tbl_df which would be useful when we are performing other operations using the dplyr functions.

str(dat1)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame':       23 obs. of  2 variables:
#  $ Gender: Factor w/ 3 levels "   Male"," Male",..: 3 3 3 3 3 3 3 3 3 3 ...
#  $ Age   : Factor w/ 5 levels "65+ years","25-29 years",..: 5 5 5 5 5 1 1 1 1 1 ...

dat1 %>%
     as.data.frame()
#   Gender         Age
#1   Female 55-59 years
#2   Female 55-59 years
#3   Female 55-59 years
#4   Female 55-59 years
#5   Female 55-59 years
#6   Female   65+ years
#7   Female   65+ years
#8   Female   65+ years
#9   Female   65+ years
#10  Female   65+ years
#11  Female   65+ years
#12  Female   65+ years
#13  Female   65+ years
#14  Female   65+ years
#15  Female   65+ years
#16    Male 25-29 years
#17    Male 25-29 years
#18    Male 25-29 years
#19    Male 25-29 years
#20    Male 40-44 years
#21    Male 40-44 years
#22    Male 40-44 years
#23    Male 50-54 years

Reshaping data using tidyr

3 Answers3