Regex extract info between two comma

Question

data<-data.frame(x=c("a,b","c","a,b","d,e,f,g"))
        x
1     a,b
2       c
3     a,b
4 d,e,f,g

I would like to extract info from column x and write every unique info into column y, what should I do? Thank you! Col y is expected like:

  y
1 a
2 b
3 c
4 d
5 e
6 f
7 g

Regex is not really needed here. Something like `unique(scan(text=as.character(data$x), sep=",", what=""))` would probably do it. `strsplit()` would be another option. — Rich Scriven, Jul 11 '16 at 20:48
Or using `strsplit`. For example : `unlist(strsplit(as.character(data$x),","))` — agstudy, Jul 11 '16 at 20:49
If the data is just comma separated, there is no need for a regex, really. Otherwise, it could look like `y <- unique(unlist(str_extract_all(data$x, "[^,]+")))` or something more specific. — Wiktor Stribiżew, Jul 11 '16 at 20:51

score 1 · Answer 1 · answered Jul 11 '16 at 20:55

d<-data.frame(x=c("a,b","c","a,b","d,e,f,g"))

> levels(d$x)
[1] "a,b"     "c"       "d,e,f,g"

> e <- as.character(levels(d$x))
> e
[1] "a,b"     "c"       "d,e,f,g"
> 

> f <- strsplit(e,",")
> f
[[1]]
[1] "a" "b"

[[2]]
[1] "c"

[[3]]
[1] "d" "e" "f" "g"

unlist(f)
[1] "a" "b" "c" "d" "e" "f" "g"

Ben Bolker · Answer 2 · 2016-07-11T21:20:08.183

1

A tidyr solution:

library(tidyr)
data %>% unnest(x=strsplit(as.character(x),",")) %>% unique()

or (thanks to @alistaire)

data %>% separate_rows(x) %>% unique()

edited Jul 11 '16 at 21:20

answered Jul 11 '16 at 20:55

Ben Bolker

211,554
25
370
453

1

Hadley wrote a wrapper for that structure in 0.5.0: `data %>% separate_rows(x) %>% unique()` – alistaire Jul 11 '16 at 21:01
thanks, I thought I'd seen something that go by, but couldn't find it – Ben Bolker Jul 11 '16 at 21:19

Regex extract info between two comma

2 Answers2