4

A sample of my data is as follows:

df1 <- read.table(text = "var Time  
 12O    12
13O 11
22B 45
33Z 22
21L 2
11M 13", header = TRUE)

I want to separate values in column "Var" to get the following data:

df2 <- read.table(text = " Group1 Group2 Group3 
 1  2   O
1   3   O
2   2   B
3   3   Z
2   1   L
1   1   M", header = TRUE)

I tried the following codes:

 df2 <- df1 %>% separate(var, into = c('Group1', 'Group2','Group3'), sep = 1)

I get an error. I have searched to find the error out, but I have failed.

Henrik
  • 65,555
  • 14
  • 143
  • 159

3 Answers3

1

As far as I am concerned (Separate outputs empty separator error for each row independently), this cannot be done with tidyr separate(). A possibility is str_split() from stringr or strsplit() from base R.

So, using str_split():

df1 %>%
 mutate(var = str_split(var, pattern = "")) %>%
 unnest() %>%
 group_by(Time) %>%
 mutate(val = var,
        var = paste0("Group", row_number())) %>%
 spread(var, val) %>%
 ungroup()

   Time Group1 Group2 Group3
  <int> <chr>  <chr>  <chr> 
1     2 2      1      L     
2    11 1      3      O     
3    12 1      2      O     
4    13 1      1      M     
5    22 3      3      Z     
6    45 2      2      B

Using strsplit():

df1 %>%
 mutate(var = strsplit(as.character(var), split = "", fixed = TRUE)) %>%
 unnest() %>%
 group_by(Time) %>%
 mutate(val = var,
        var = paste0("Group", row_number())) %>%
 spread(var, val) %>%
 ungroup()

To have new columns with appropriate class (character, integer etc.), you can add convert = TRUE into spread().

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

A possible base/stringr solution:

res<-as.data.frame(do.call(rbind,strsplit(stringr::str_replace_all(df1$var
 ,"([0-9])([0-9])([A-Z])","\\1 \\2 \\3"),
          " ")))
 names(res)<-paste0("Group",1:ncol(res))


 cbind(df1["Time"],res)
  Time Group1 Group2 Group3
1   12      1      2      O
2   11      1      3      O
3   45      2      2      B
4   22      3      3      Z
5    2      2      1      L
6   13      1      1      M
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
1

If you want to retain the original column, you can use str_split_fixed from stringr package and cbind the result to your existing dataframe

cbind(df1, str_split_fixed(as.character(df1$var),"", n = 3))

  var Time 1 2 3
1 12O   12 1 2 O
2 13O   11 1 3 O
3 22B   45 2 2 B
4 33Z   22 3 3 Z
5 21L    2 2 1 L
6 11M   13 1 1 M
Jason Mathews
  • 765
  • 3
  • 13