1

I have a R dataFrame that contains the following:

column1          column2
score1...test1     10
score2...test2     11
score3...test3     15

I would like to reshape my dataFrame into the following:

column1          column2    score     test
score1...test1     10       score1    test1
score1...test2     11       score2    test2
score1...test3     15       score3    test3

I tried to use

library(stringr)
temp=str_split_fixed(df, " ...", 4)

but I am getting this

 [,1] [,2] [,3] [,4]

How can tackle this problem?

Cœur
  • 37,241
  • 25
  • 195
  • 267
user3841581
  • 2,637
  • 11
  • 47
  • 72
  • You are using this function in correctly. You should split a column, not the whole data set. And the pattern should be just "..." rather " ...". Also, `n` should 2 instead of 4. Regardlles, this seems like a bug in the `stringr` package because `str_split_fixed(df$column1, "...", 2)` doesn't work properly, whereas `stringi::stri_split_fixed(df$column1, "...", 2, simplify = TRUE)` works. And this `str_split_fixed(df$column1, "\\.\\.\\.", 2)` works too, while it shouldn't as this should be `fixed`. The whole thing looks messed up. – David Arenburg Mar 14 '16 at 12:00

2 Answers2

4

You can do

library(splitstackshape)
cSplit(df, 1, "...", drop=F)
#           column1 column2 column1_1 column1_2
# 1: score1...test1      10    score1     test1
# 2: score2...test2      11    score2     test2
# 3: score3...test3      15    score3     test3

Or setnames(cSplit(df, 1, "...", drop=F), 3:4, c("score", "test"))[] it you need to give custom names.

lukeA
  • 53,097
  • 5
  • 97
  • 100
2

We can do this with base R. Replace the ... with , using sub, read the strings with read.csv to create a data.frame with two columns, cbind with the original dataset to get the expected output.

  cbind(df,read.csv(text=sub('[[:punct:]]+', ',', 
      df$column1), header=FALSE, col.names=c('score', 'test')))
 #          column1 column2  score  test
 #1 score1...test1      10 score1 test1
 #2 score2...test2      11 score2 test2
 #3 score3...test3      15 score3 test3

Or we can directly use ... as the pattern in sub, replace with `,', and the rest is above.

 cbind(df,read.csv(text=sub('...', ',', 
      df$column1, fixed=TRUE), header=FALSE, 
         col.names=c('score', 'test')))

If we need a package solution, separate from tidyr can be used.

library(tidyr)
separate(df, column1, into=c("score", "test"), remove=FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • what is the spliting criteria for the second method? can I specify the "..." as my splitting criteria? – user3841581 Mar 14 '16 at 12:08
  • @user3841581 In the first two cases, we replaced the `...` with `,` and by default, the `read.csv` use `sep=","`. In `separate`, it will automatically detect the sep as `...` – akrun Mar 14 '16 at 12:09
  • sorry for the question, I already have my dataFrame, why should I read it again? If I already have the dataFrame, how does the first and second method works? – user3841581 Mar 14 '16 at 12:13
  • @user3841581 I understand that you already have a data.frame. It is a way to split the dataset if you are using the first method in `base R`. In the `separate`, it is more straightforward. – akrun Mar 14 '16 at 12:14
  • thank you so much. The problem is that, I might have pattern like "k..score2...test2...results". your method seems to split everything. That is why I would have like to have a splitting condition. – user3841581 Mar 14 '16 at 12:30
  • @user3841581 In that case use `gsub` instead of `sub` as `sub` will only replace the first occurence and use `read.csv(text=gsub('...', ',', df$column1, fixed=TRUE), header=FALSE, fill=TRUE)` – akrun Mar 14 '16 at 12:31
  • 1
    Perfect. It works well:) – user3841581 Mar 14 '16 at 12:43