0

I have a problem that is an extension of a well-covered issue here on SE. I.e:

Split a column of a data frame to multiple columns

My data has a column with a string format, comma-separated, but of no fixed length.

data = data.frame(id = c(1,2,3), treatments = c("1,2,3", "2,3", "8,9,1,2,4"))

So I would like to have my dataframe eventually be in the proper tidy/long form of:

id    treatments
1     1
1     2
1     3
...
3     1
3     2
3     4

Something like separate or strsplit doesn't seem on it's own to be the solution. Separate fails with warnings that various columns have too many values (NB id 3 has more values than id 1).

Thanks

JohnL_10
  • 549
  • 5
  • 15

3 Answers3

3

You can use tidyr::separate_rows:

library(tidyr)
separate_rows(data, treatments)

#   id treatments
#1   1          1
#2   1          2
#3   1          3
#4   2          2
#5   2          3
#6   3          8
#7   3          9
#8   3          1
#9   3          2
#10  3          4
Psidom
  • 209,562
  • 33
  • 339
  • 356
0

Using dplyr and tidyr packages:

data  %>% 
  separate(treatments, paste0("v", 1:5)) %>% 
  gather(var, treatments, -id) %>% 
  na.exclude %>% 
  select(id, treatments) %>%
  arrange(id)


   id treatments
1   1          1
2   1          2
3   1          3
4   2          2
5   2          3
6   3          8
7   3          9
8   3          1
9   3          2
10  3          4
Adam Quek
  • 6,973
  • 1
  • 17
  • 23
0

You can also use unnest:

library(tidyverse)
data %>% 
  mutate(treatments = stringr::str_split(treatments, ",")) %>% 
  unnest()

   id treatments
1   1          1
2   1          2
3   1          3
4   2          2
5   2          3
6   3          8
7   3          9
8   3          1
9   3          2
10  3          4
HubertL
  • 19,246
  • 3
  • 32
  • 51