0

I have the following df and would like to spread/cast.

df <- data.frame(experiment=c("ex3", "ex1", "ex1", "ex2","ex7", "ex7"),
                 mod=c("mod1", "mod1","mod7", "mod8","mod3", "mod9"))

df

  experiment  mod
1        ex3 mod1
2        ex1 mod1
3        ex1 mod7
4        ex2 mod8
5        ex7 mod3
6        ex7 mod9

desired output

  experiment mod_A mod_B
1        ex1  mod1  mod7
2        ex2  mod8  <NA>
3        ex3  mod1  <NA>
4        ex7  mod3  mod9

I have tried tidyr::spread but get an error

df %>%  spread(experiment, mod)

Error: Duplicate identifiers for rows (2, 3), (5, 6)

Any help would be appreciated.

Moe
  • 131
  • 1
  • 9

2 Answers2

2

We can create an ID column for each experiment group to overcome this issue.

library(dplyr)
library(tidyr)

df2 <- df %>%
  arrange(experiment, mod) %>%
  group_by(experiment) %>%
  mutate(ID = 1:n()) %>%
  spread(ID, mod) %>%
  ungroup()
df2
# # A tibble: 4 x 3
#   experiment `1`   `2`  
#   <fct>      <fct> <fct>
# 1 ex1        mod1  mod7 
# 2 ex2        mod8  NA   
# 3 ex3        mod1  NA   
# 4 ex7        mod3  mod9
www
  • 38,575
  • 12
  • 48
  • 84
0

With dplyr (version 0.7.5), and tidyr (version 0.8.1) you can simply summarise and separate.

df %>%
  group_by(experiment) %>%
  summarise(mod = paste(mod, collapse = ",")) %>%
  separate(mod, into = c("mod_A", "mod_B"))

# A tibble: 4 x 3
#   experiment mod_A mod_B
#   <chr>      <chr> <chr>
# 1 ex1        mod1  mod7 
# 2 ex2        mod8  NA   
# 3 ex3        mod1  NA   
# 4 ex7        mod3  mod9 
hpesoj626
  • 3,529
  • 1
  • 17
  • 25