2

I have data on fish ID and the ID variable is made up of a four letter code, first letter for paternity, second for maternity, third for treatment, fourth for the individual. A single observation may look like this BBRG.

This data is as a single variable and I need to split these letters into separate columns, since there is nothing separating them I wasn't sure what to place in the sep= argument in separate.

Example data:

test.dataframe <- data.frame(observation = c(1:10),
                             VIE.Code = c("BBRG", "BRBR", "PPWG", "RRWW",
                                          "WRWR", "BBBP", "PBPB", "PPGW",
                                          "RWRW", "BGBG"))
zx8754
  • 52,746
  • 12
  • 114
  • 209
T. Miller
  • 29
  • 3
  • FYI, `separate` is a `tidyr` function, not a `dplyr` one. I've edited accordingly – camille Oct 29 '18 at 16:19
  • Also, it's very difficult to help well without a [reproducible question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). That includes posting a representative sample of data, the code you've tried that hasn't worked, and the desired outcome. Otherwise, we're kind of in the dark about what's going on. – camille Oct 29 '18 at 16:20
  • You can edit your question to put code there, which is much easier to read and format than in comments – camille Oct 29 '18 at 17:20
  • Hi, thanks for replying, I understand and I've attached some code that I used to create a similar but simple data frame that I'm trying to test things on. ` test.dataframe <- data.frame(observation = c(1:10), VIE.Code = c("BBRG", "BRBR", "PPWG", "RRWW", "WRWR", "BBBP", "PBPB", "PPGW", "RWRW", "BGBG")) ` Apologies, I'm quite new to r and all this – T. Miller Oct 30 '18 at 14:39
  • I've tried to put that as code by putting it in backticks (as it says in the markdown help) but it's not worked – T. Miller Oct 30 '18 at 14:42
  • You can edit the question to include code; it's much easier to format and read there than in comments. – camille Oct 30 '18 at 14:50

1 Answers1

1

We can use a regex lookaround

library(tidyverse)
df1 %>% 
  separate(ID, into = c("paternity", "maternity", "treatment", "individual"), 
             sep="(?<=[A-Z])")

Or specify the sep as the location index

df1 %>%
  separate(ID, into = c("paternity", "maternity", "treatment", "individual"),
          sep= c(1, 2, 3))

A base R method would be to split

do.call(rbind, strsplit(df1$ID, ""))

data

df1 <- data.frame(ID = c("BBRG", "BBGR"), stringsAsFactors = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Hi there, that's a really good sounding method but I've tried it on my dataset and get the error Error in UseMethod("separate_") : no applicable method for 'separate_' applied to an object of class "factor" is my ID variable in the wrong format? – T. Miller Oct 29 '18 at 16:13
  • @T.Miller It should work for `factor` as well. If there is an issue try `df1 %>% mutate(ID = as.character(ID)) %>% separate(ID, ..` – akrun Oct 29 '18 at 16:14