-1

I have been sifting across the internet for long and stack overflow in particular. But, I do not seem to find a regex explanation to find multiple intersecting / non-intersecting sub strings:

Suppose my original string is:
aabcdacdacdfghcdacds
and the sub string to be fetched is:
cdacd
and I wish to find intersecting or non-intersecting sub strings as groups
This means I want the regex for three groups from the original string:
group 1: (cdacd)
group 2: (cdacd)
group 3: (cdacd)
Notice that cdacd for group 1 and group2 in aabcdacdacdfghcdacds have an intersecting cd.
Please advise.

  • Please state your actual problem. If you are doing log parsing, show actual data from your log and how it requires finding intersecting matches. (If you need to anonymise IPs or other relevant content, do so, as long as the problem remains recognisable). Your current description is a bit too abstract to show which approach would be best suited. – Amadan Jan 24 '17 at 05:32
  • Google -> Search "python overlapping regex" -> Top result. – Wiktor Stribiżew Jan 24 '17 at 08:02
  • @WiktorStribiżew Please see my question's edit. I want the regex to process the pattern as groups. I do not want to use findall() from python. This is more of a REGEX question rather than a python question. Merely because I am using the regex with python does not render my stance lesser. – vedlociraptor Jan 24 '17 at 09:08
  • I do not see any difference. Please explain your problem in the question. SHOUTING in the title is not a good idea (I am OK with it, but usually it is frowned upon). – Wiktor Stribiżew Jan 24 '17 at 09:12
  • That is not shouting sire @WiktorStribiżew. Please check. 1. The question that you have appended as the one for which mine may be duplicate does not process it in the regex. 2. It does not return OVERLAPPING patterns as GROUPS. (IF YOU WOULD HAVE READ MY QUESTION IN ENTIRETY YOU WOULD HAVE KNOWN WHAT I WANT!) 3. WHAT IT RETURNS IS ALL POSSIBLE SUB STRINGS OF SPECIFIED LENGTH! – vedlociraptor Jan 24 '17 at 09:17
  • And that *is* shouting. – Jongware Jan 24 '17 at 09:24
  • @RadLexus Can you please assess the difference in the question as suggested ? – vedlociraptor Jan 24 '17 at 09:45

1 Answers1

1

try it like this:

In [1]: import re
In [2]: re.findall('(?=(cdacd))', 'aabcdacdacdfghcdacds')
Out[2]: ['cdacd', 'cdacd', 'cdacd']

from python docs (search for ?=):

Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

ShmulikA
  • 3,468
  • 3
  • 25
  • 40