How to use regular expressions to find all sub-strings?

Question

How do we find all length-n sub-strings in a string? Suppose the string is 'Jonathan'. All length-3 sub-strings are then:

 'Jon','ona',...'han'

I would like to use regex for this. I tried using re.findall('...','Jonathan), but didn't quite give me what I wanted.

From https://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp - `re.findall(r'(?=(\w\w\w))', 'Jonathan')` — , Jun 04 '19 at 03:44
@Chris - How is this question a duplicate of the above-stated link? They look like two different questions to me. I mean, how does it answer the OP's question? The one mentioned by Justin Ezequiel seems duplicate to me. — Justin, Jun 04 '19 at 13:33
Please [accept](http://meta.stackexchange.com/questions/5234) an answer if you think it solves your problem. It will help community at large to recognize the correct solution. This can be done by clicking the green check mark next to the answer. See this [image](http://i.stack.imgur.com/uqJeW.png) for reference. Cheers. — Austin, Jun 04 '19 at 14:01

Justin · Answer 1 · 2019-06-04T16:05:33.147

1

If you really want to use regex for your task then I suggest you use this -

import re
print(re.findall(r'(?=(\w\w\w))', 'Jonathan'))

You can increase or decrease the number of \w's, depending on how many length-n sub-strings you want.

Output -

['Jon', 'ona', 'nat', 'ath', 'tha', 'han']

Another example -

print(re.findall(r'(?=(\w\w\w\w))', 'Jonathan'))

Output -

['Jona', 'onat', 'nath', 'atha', 'than']

Hope this helps!

Following your recent comment, here's something that might work -

Example 1 -

import re
s = "amam"
m = re.compile(".m.")
h = m.findall(s)
print(h)

Output -

['ama']

Example 2 -

import re
s = "Jonathan"
m = re.compile(".o.")
h = m.findall(s)
print(h)

Output -

['Jon']

Example 3 -

import re
s = "Jonathanona"
m = re.compile(".o.")
h = m.findall(s)
print(h)

Output -

['Jon', 'non']

Hope this helps!

edited Jun 04 '19 at 16:05

answered Jun 04 '19 at 05:02

Justin

@tripleee - I used it since OP wanted to use regular expressions. But I also agree that using regex seems very roundabout. – Justin Jun 04 '19 at 05:33
@Justin: what if I wanted to find all '.o.' patterns in Jonathan? should I use r'(?=(\w o \w))' ? – Sina Jun 04 '19 at 14:20
@Sina - Could you show an example output? – Justin Jun 04 '19 at 14:37
@Justin: take string='amam'. suppose I am looking for a substring that matches '.m.' . It would be 'ama' in this case. – Sina Jun 04 '19 at 15:32
@Sina - Look at my edit. – Justin Jun 04 '19 at 16:05

score 0 · Answer 2 · answered Jun 04 '19 at 03:43

0

You don't need a regex for that. Use zip:

name = 'Jonathan'

print([x + y + z for x, y, z in zip(name, name[1:], name[2:])])
# ['Jon', 'ona', 'nat', 'ath', 'tha', 'han']

answered Jun 04 '19 at 03:43

Austin

1

Maybe even simpler: `print([name[i:i+3] for i in range(len(name)-2)])` – enumaris Jun 04 '19 at 05:07

2 Answers2