1

For example lets take the sequence

"aaaaaa". 

I want regex to match all subsequences, including repeating characters. Meaning the total count of subsequences should be 5, instead of 3.

Clarification:

Lets numerate our characters. Our sequence will look something like

"a1a2a3a4a5a6"

All subsequences are:

"a1a2", "a2a3". "a3a4", "a4a5", "a5a6"

Can I do that in regex? I am currently programming in Java and I know it is possible to develop an algorithm there, but I would like to avoid that for now.

Alex
  • 715
  • 1
  • 8
  • 29
  • 1
    http://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches can help you start. – Wiktor Stribiżew Apr 01 '16 at 11:42
  • I understand that this is possible within java, but it is going be quite messy with my current skills so I was asking if it is an option in regex, because it seems logical to me to have such an option. I'll edit to be more clear. – Alex Apr 01 '16 at 12:54
  • `(?=((a)\2))` - the values are in Group 1. – Wiktor Stribiżew Apr 01 '16 at 12:55
  • @WiktorStribiżew Thanks, it works. However can you please explain how exactly. I looked up positive lookahead, but still can't understand how is your expression working exactly. – Alex Apr 01 '16 at 16:39
  • What is the programming language? What method are you using? – Wiktor Stribiżew Apr 01 '16 at 16:40
  • Ok, I posted an answer. It is a bit more complicated than what I used to close this question with. – Wiktor Stribiżew Apr 01 '16 at 17:58

1 Answers1

0

You can use the following regex:

(?=((a)\2))

See demo

The technique of capturing the overlapping substrings inside a positive lookahead is described here.

The difference is that you need to use 2 capturing groups: one is a "functional", technical, inner group to make sure we match two identical consecutive symbols, and the outer group (ID#1) that we can use to extract the values we need.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563