I have a column which is filled with strings containing multiple dots. I want to split this column into two containing the two substrings before and after the first dot.
I.e.
comb num
UWEA.n.49.sp 3
KYFZ.n.89.kr 5
...
Into
a b num
UWEA n.49.sp 3
KYFZ n.89.kr 5
...
I'm using the separate
function from tidyr
but cannot get the regexp correct. I'm trying to use the regex style from this answer:
foo %>%
separate(comb, into=c('a', 'b'),
sep="([^.]+)\\.(.*)")
So that column a
should be determined by the first capture group ([^.]+)
containing at least one non-dot characters, then the first dot, then the second capture group (.*)
just matches whatever remains after.
However this doesn't seem to match anything:
a b num
3
5
Here's my dummy dataset:
library(dplyr)
library(tidyr)
foo <- data.frame(comb=replicate(10,
paste(paste(sample(LETTERS, 4), collapse=''),
sample(c('p', 'n'), 1),
sample(1:100, 1),
paste(sample(letters, 2), collapse=''),
sep='.')
),
num = sample(1:10, 10, replace=T))