Using separate() from tidyr to separate letter code into 4 variables in r

Question

I have data on fish ID and the ID variable is made up of a four letter code, first letter for paternity, second for maternity, third for treatment, fourth for the individual. A single observation may look like this BBRG.

This data is as a single variable and I need to split these letters into separate columns, since there is nothing separating them I wasn't sure what to place in the sep= argument in separate.

Example data:

test.dataframe <- data.frame(observation = c(1:10),
                             VIE.Code = c("BBRG", "BRBR", "PPWG", "RRWW",
                                          "WRWR", "BBBP", "PBPB", "PPGW",
                                          "RWRW", "BGBG"))

FYI, `separate` is a `tidyr` function, not a `dplyr` one. I've edited accordingly — camille, Oct 29 '18 at 16:19
Also, it's very difficult to help well without a [reproducible question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). That includes posting a representative sample of data, the code you've tried that hasn't worked, and the desired outcome. Otherwise, we're kind of in the dark about what's going on. — camille, Oct 29 '18 at 16:20
You can edit your question to put code there, which is much easier to read and format than in comments — camille, Oct 29 '18 at 17:20
Hi, thanks for replying, I understand and I've attached some code that I used to create a similar but simple data frame that I'm trying to test things on. ` test.dataframe <- data.frame(observation = c(1:10), VIE.Code = c("BBRG", "BRBR", "PPWG", "RRWW", "WRWR", "BBBP", "PBPB", "PPGW", "RWRW", "BGBG")) ` Apologies, I'm quite new to r and all this — T. Miller, Oct 30 '18 at 14:39
I've tried to put that as code by putting it in backticks (as it says in the markdown help) but it's not worked — T. Miller, Oct 30 '18 at 14:42
You can edit the question to include code; it's much easier to format and read there than in comments. — camille, Oct 30 '18 at 14:50

akrun · Answer 1 · 2018-10-29T15:26:03.863

1

We can use a regex lookaround

library(tidyverse)
df1 %>% 
  separate(ID, into = c("paternity", "maternity", "treatment", "individual"), 
             sep="(?<=[A-Z])")

Or specify the sep as the location index

df1 %>%
  separate(ID, into = c("paternity", "maternity", "treatment", "individual"),
          sep= c(1, 2, 3))

A base R method would be to split

do.call(rbind, strsplit(df1$ID, ""))

data

df1 <- data.frame(ID = c("BBRG", "BBGR"), stringsAsFactors = FALSE)

edited Oct 29 '18 at 15:26

answered Oct 29 '18 at 15:20

akrun

874,273
37
540
662

Hi there, that's a really good sounding method but I've tried it on my dataset and get the error Error in UseMethod("separate_") : no applicable method for 'separate_' applied to an object of class "factor" is my ID variable in the wrong format? – T. Miller Oct 29 '18 at 16:13
@T.Miller It should work for `factor` as well. If there is an issue try `df1 %>% mutate(ID = as.character(ID)) %>% separate(ID, ..` – akrun Oct 29 '18 at 16:14

Using separate() from tidyr to separate letter code into 4 variables in r

1 Answers1

data