Regular expression on separate function of Tidyr

Question

I need separate two columns with tidyr.

The column have text like: I am Sam. I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...].

The problem is I need split it in two columns: Column one I am, and column two: Sam.

I can't find a regular expression two separate with the second blank space.

Could you help me please?

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. — Jaap, May 15 '16 at 16:02

akrun · Answer 1 · 2016-05-15T16:27:02.757

4

We can use extract from tidyr. We match one or more characters and place it in a capture group ((.*)) followed by one or more space (\\s+) and another capture group that contains only non-white space characters (\\S+) to separate the original column into two columns.

library(tidyr)
extract(df1, Col1, into = c("Col1", "Col2"), "(.*)\\s+(\\S+)")
#   Col1 Col2
#1  I am  Sam
#2 He is  Sam

data

df1 <- data.frame(Col1 = c("I am Sam", "He is Sam"), stringsAsFactors=FALSE)

edited May 15 '16 at 16:27

answered May 15 '16 at 16:23

akrun

874,273
37
540
662

1

Great answer, but you should explain what the regex is doing so OP can understand. – tblznbits May 15 '16 at 16:25

score 3 · Answer 2 · answered May 15 '16 at 16:40

As an alternative, given:

library(tidyr)
df <- data.frame(txt = "I am Sam")

you can use

separate(, txt, c("a", "b"), sep="(?<=\\s\\S{1,100})\\s") 
#      a   b
# 1 I am Sam

with separate using stringi::stri_split_regex (ICU engine), or

separate(df, txt, c("a", "b"), sep="^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)

with the older (?) separate using base:strsplit (Perl engine). See also

strsplit("I am Sam", "^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
# [[1]]
# [1] "I am" "Sam"

But it might be a bit "esoterique"...

Regular expression on separate function of Tidyr

2 Answers2

data

Linked