0

I need separate two columns with tidyr.

The column have text like: I am Sam. I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...].

The problem is I need split it in two columns: Column one I am, and column two: Sam.

I can't find a regular expression two separate with the second blank space.

Could you help me please?

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap May 15 '16 at 16:02

2 Answers2

4

We can use extract from tidyr. We match one or more characters and place it in a capture group ((.*)) followed by one or more space (\\s+) and another capture group that contains only non-white space characters (\\S+) to separate the original column into two columns.

library(tidyr)
extract(df1, Col1, into = c("Col1", "Col2"), "(.*)\\s+(\\S+)")
#   Col1 Col2
#1  I am  Sam
#2 He is  Sam

data

df1 <- data.frame(Col1 = c("I am Sam", "He is Sam"), stringsAsFactors=FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
3

As an alternative, given:

library(tidyr)
df <- data.frame(txt = "I am Sam")

you can use

separate(, txt, c("a", "b"), sep="(?<=\\s\\S{1,100})\\s") 
#      a   b
# 1 I am Sam

with separate using stringi::stri_split_regex (ICU engine), or

separate(df, txt, c("a", "b"), sep="^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE) 

with the older (?) separate using base:strsplit (Perl engine). See also

strsplit("I am Sam", "^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
# [[1]]
# [1] "I am" "Sam" 

But it might be a bit "esoterique"...

lukeA
  • 53,097
  • 5
  • 97
  • 100