2

I have a vector of text values, something like:

Col1
camel; cow; giraffe; panda; zebra
camel; giraffe; zebra
panda; zebra

And I would like to separate these values into an equal number of columns so that the values align vertically in the same columns, with NA where there are gaps so that I get:

Col1, Col2, Col3, Col4, Col5
camel, cow, giraffe, panda, zebra
camel, NA, giraffe, NA, zebra
NA, NA, NA, panda, zebra

How can I accomplish this? I've tried using the 'separate' function in dplyr but this returns error messages, whilst using the 'split' function in splitshapestack allows me to separate the columns but they don't return values that are aligned vertically. I have also tried using the 'lapply' and 'match' in a similar way to this post but again, I keep getting error messages.

Harry
  • 61
  • 6
  • 1
    Check `splitstackshape::cSplit_e`: `cSplit_e(d, "Col1", "; ", type = "character", mode = "value", fill = NA, drop = TRUE)` – Henrik Mar 02 '21 at 23:33

1 Answers1

3

You can pivot your data a few times to align these values:

library(dplyr)

df %>% 
  tibble::rowid_to_column("id") %>% 
  tidyr::separate_rows(Col1, sep = "; ") %>% 
  tidyr::pivot_wider(id_cols = id,
                     names_from = Col1,
                     values_from = Col1) %>% 
  dplyr::select(-id) %>% 
  magrittr::set_colnames(paste0("Col", 1:ncol(.)))

Output

 Col1  Col2  Col3    Col4  Col5 
  <chr> <chr> <chr>   <chr> <chr>
1 camel cow   giraffe panda zebra
2 camel NA    giraffe NA    zebra
3 NA    NA    NA      panda zebra

How it works

  1. rowid_to_column keeps track of the row numbers so when the data are pivoted to a longer format, we don't lose track of which values belong in which rows.
  2. separate_rows will separate Col1 and pivot the data to a longer format. Suggested by @Adam as an improvement.
  3. pivot_wider aligns everything into the columns you specified.

Data

structure(list(Col1 = c("camel; cow; giraffe; panda; zebra", 
"camel; giraffe; zebra", "panda; zebra")), class = "data.frame", row.names = c(NA, 
-3L))
LMc
  • 12,577
  • 3
  • 31
  • 43
  • 1
    You can save the `pivot_longer()` step by using `separate_rows()` as well. –  Mar 02 '21 at 22:19
  • You will lose track of which values belong to which rows. – LMc Mar 02 '21 at 22:22
  • 1
    Not if you put it after your `rowid_to_column()` step! –  Mar 02 '21 at 22:24
  • Good point! Thanks for the suggestion. – LMc Mar 02 '21 at 22:27
  • @LMc Thanks! The 'How it works' is really useful but I've tried running your code but I get an error message: "Error: Column 2312 must be named. Use .name_repair to specify repair." Any idea why this would be? – Harry Mar 02 '21 at 22:28
  • Try the updated code with improvements suggested by @Adam. This works on the example posted in your original question. Is this an error you're getting running it on your actual code? If so, then your example might not be capturing the nuances of your problem. – LMc Mar 02 '21 at 22:31
  • Thanks both - @LMc, it turns out I had 2 blank cells, so I replaced them with NA and re-ran the code and it worked perfectly. Problem solved! – Harry Mar 02 '21 at 22:52