How to match identical characters in regex in R

Asked Feb 07 '20 at 16:31

Active Feb 07 '20 at 17:05

Viewed 113 times

I can't seem to figure out how to match identical characters in regex in R. Suppose I have this data:

dt <- c("12345", "asdf", "#*§", "AAAA", ";;;;", "9999", "%:=+")

I'm able to extract all strings that consist exactly of any 4 non-whitespace characters, for example like this:

pattern <- "\\S{4}"
extract <- function(x) unlist(regmatches(x, gregexpr(pattern, x, perl = T)))
extract(dt)
[1] "1234" "asdf" "AAAA" ";;;;" "9999" "%:=+"

But what I really want to match are those strings in which the same character is repeated 4 times, giving this ouput:

[1] "AAAA" ";;;;" "9999"

Any ideas?

asked Feb 07 '20 at 16:31

Chris Ruehlemann

20,321
4
12
34

1

Change the quantifier in [this](https://stackoverflow.com/q/38263441/5325862) to `{3}` – camille Feb 07 '20 at 16:38
1

Try a capture group and reference it... It gets a little tricky because you need to capture it then use a back-reference to look for 3 more like this: `(\\S)\\1{3}` so you have 4 characters total. – dvo Feb 07 '20 at 16:39
@dvo Thanks a lot: `grep("(\\S)\\1{3}", dt, value = T)`outputs `[1] "AAAA" ";;;;" "9999"`, i.e., the desired result – Chris Ruehlemann Feb 07 '20 at 16:50

How to match identical characters in regex in R

0 Answers0