3

I'm trying to expand an R data table that looks like this:

a   step_num   duration 

1          1          5 
1          2          4
1          3          1
2          1          7
2          2          2
2          3          9
3          1          1
3          2          1
3          3          3

Into something that looks like this:

a | step_num | duration | 1_duration | 2_duration | 3_duration |
----------------------------------------------------------------
1          1          5            5           -            -
1          2          4            -           4            -
1          3          1            -           -            1
2          1          7            7           -            -
2          2          2            -           2            -
2          3          9            -           -            9
3          1          1            1           -            -
3          2          1            -           1            -
3          3          3            -           -            3

I'm wondering if there's an 'expand' function, so to speak, that would do this.

Thanks!

M--
  • 25,431
  • 8
  • 61
  • 93
MadMan
  • 65
  • 1
  • 7

4 Answers4

3

We can do this in base r.

cbind(df,
      reshape(df, idvar = c("a","step_num"), timevar = "step_num", direction = "wide")[,-1])

#>   a step_num duration duration.1 duration.2 duration.3
#> 1 1        1        5          5         NA         NA
#> 2 1        2        4         NA          4         NA
#> 3 1        3        1         NA         NA          1
#> 4 2        1        7          7         NA         NA
#> 5 2        2        2         NA          2         NA
#> 6 2        3        9         NA         NA          9
#> 7 3        1        1          1         NA         NA
#> 8 3        2        1         NA          1         NA
#> 9 3        3        3         NA         NA          3

Created on 2019-05-21 by the reprex package (v0.2.1)

M--
  • 25,431
  • 8
  • 61
  • 93
  • Awesome! Thanks for this simple answer. – MadMan May 21 '19 at 22:09
  • @MadMan You're welcome. If you want to get `-` instead of `NA`, which I do not recommend, look at this thread: https://stackoverflow.com/questions/19379081/how-to-replace-na-values-in-a-table-for-selected-columns-data-frame-data-tab – M-- May 21 '19 at 22:27
  • NA is the way to go. Just used the ``` - ``` for brevity. – MadMan May 22 '19 at 01:13
2

Here's an approach using dplyr and tidyr.

We take the original data and add on some columns by first adding a new column col which holds the column header we want, based on the step_num. Then we use tidyr::spread to put the durations into different columns depending on which col they go with. fill = "-" fills all the empty columns with dashes. Finally, we drop the a and step_num columns since they're already there in the original data and we don't want to have copies of them.

(Note, we needed step_num to still exist at the spread step, because we wanted to keep each row aligned with the original rows. Without step_num, the data would get spread into a wider, shorter format that would have misaligned rows.)

library(dplyr); library(tidyr)
df %>%  
  mutate(col = paste0(step_num, "_duration")) %>%
  spread(col, duration, fill = "-") %>%
  select(-a, -step_num)) %>% 
  bind_cols(df, .)  # Edit, per excellent suggestion from M-M


  a step_num duration 1_duration 2_duration 3_duration
1 1        1        5          5          -          -
2 1        2        4          -          4          -
3 1        3        1          -          -          1
4 2        1        7          7          -          -
5 2        2        2          -          2          -
6 2        3        9          -          -          9
7 3        1        1          1          -          -
8 3        2        1          -          1          -
9 3        3        3          -          -          3
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Just an idea. I think this reads better. ```df %>% mutate(col = paste0(step_num, "_duration")) %>% spread(col, duration, fill= "-") %>% select(-a, -step_num) %>% bind_cols(df,.)``` – M-- Jun 06 '19 at 22:32
2

Simple tidyverse solution:

library(tidyverse)

df %>%
  mutate(step = step_num) %>%
  spread(step, duration, fill = '-') %>%
  rename_all( ~ gsub('(\\d+)', 'duration_\\1', .))

#   a step_num duration_1 duration_2 duration_3
# 1 1        1          5          -          -
# 2 1        2          -          4          -
# 3 1        3          -          -          1
# 4 2        1          7          -          -
# 5 2        2          -          2          -
# 6 2        3          -          -          9
# 7 3        1          1          -          -
# 8 3        2          -          1          -
# 9 3        3          -          -          3
utubun
  • 4,400
  • 1
  • 14
  • 17
2

Or an option with dcast from data.table

library(data.table)
dcast(setDT(df),  a + step_num ~  
        paste0("duration_", step_num), value.var = 'duration')
#    a step_num duration_1 duration_2 duration_3
#1: 1        1          5         NA         NA
#2: 1        2         NA          4         NA
#3: 1        3         NA         NA          1
#4: 2        1          7         NA         NA
#5: 2        2         NA          2         NA
#6: 2        3         NA         NA          9
#7: 3        1          1         NA         NA
#8: 3        2         NA          1         NA
#9: 3        3         NA         NA          3

NOTE: It is better to have NA instead of - as NA is easily removable with is.na/complete.cases/na.omit and it wouldn't change the class of the column to character

data

df <- structure(list(a = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), step_num = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), duration = c(5L, 4L, 1L, 7L, 
2L, 9L, 1L, 1L, 3L)), class = "data.frame", row.names = c(NA, 
-9L))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662