Python Regular expression: Find adjacent characters

Question

I need to get a list that contains every two adjacent characters in the string hello such that

['he', 'el', 'll', 'lo']

I thought I could do it int this way

>>>import re
>>>re.findall(r'..', 'hello')
['he', 'll']

Which is not what I want. I need to get a list as I mentioned above using Regular Expression

@coldspeed: you couldn't have linked the dupe with the exact same question? :P https://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp/18966698 — r.ook, Jan 13 '18 at 06:30
coldspeed is too eager to mark a question as duplicate. In my opinion to mark a question as duplicate needs more time to analyze and find the most appropriate duplicate. — , Jan 13 '18 at 06:40
To be fair, it was a thread he answered before (and recently edited), so it was natural for him to pick that one. I was just poking fun because I found the other link while I was looking up `re` with overlapping matches, and thought it was funny it was the exact same question. — r.ook, Jan 13 '18 at 06:49

r.ook · Accepted Answer · 2018-01-13T06:29:41.603

5

Good news! Your question is an exact duplicate of this one, which gives you the exact regex needed:

>>> re.findall(r'(?=(\w\w))', 'hello')
['he', 'el', 'll', 'lo']

Read the linked thread for more logic behind it.

Original Answer:

No need for regex. You can use list comprehension for that.

h = 'hello'

a = [h[i:i+2] for i in range(len(h)-1)]

Result:

['he', 'el', 'll', 'lo']

Edit: RoadRunner's zip/map solution is more elegant. That said, this solution is scale-able so if you want, you can get more than just 2 adjacent characters:

func = lambda my_list, n: [my_list[i:i+n] for i in range(len(my_list)-n+1)]

# OR, as RoadRunner suggested a cleaner read if you don't like lambdas:

def func(my_list, n): return [my_list[i:i+n] for i in range(len(my_list)-n+1)]

This will give you:

>>> func('hello', 2)
['he', 'el', 'll', 'lo']
>>> func('hello', 3)
['hel', 'ell', 'llo']
>>> func('hello', 4)
['hell', 'ello']

edited Jan 13 '18 at 06:29

answered Jan 13 '18 at 05:50

r.ook

13,466
2
22
39

Your answer seem to be better than @RoadRunner cuz i can extend it to solve my real problem. The question asked here is not the exact question. But why re.findall(r'..', 'hello') does not return every possible two adjacent characters? as .. represents every two adjacent characters in re. Can u do it using re? – Jan 13 '18 at 06:00
The last edit you made is what i exactly want though i didn't write it in the question. So accepted answer for you. but can u provide it using re? – Jan 13 '18 at 06:05
1

I'm not much of a regex user, but it seems that once `re` has a match, it continues from the character *after* the match, that's why it continued from 'he' to 'll' instead of 'el'. When I ran `re.finditer(r'..','hello')` the iterable returned this: `<_sre.SRE_Match object; span=(0, 2), match='he'>, <_sre.SRE_Match object; span=(2, 4), match='ll'>`. As you can see, the span went from 0,2 to 2,4. – r.ook Jan 13 '18 at 06:08
@skilledDt: Based on the [`re` documentation](https://docs.python.org/3/library/re.html#re.findall), `findall` returns non-overlapping matches, that's why it wouldn't work. There *might* be a function for overlapping matches, but I haven't seen it in the documentation. – r.ook Jan 13 '18 at 06:19

RoadRunner · Answer 2 · 2018-01-13T06:22:36.563

You don't need regex here, you can do this easily with zip():

>>> s = "hello"
>>> [x + y for x, y in zip(s, s[1:])]
['he', 'el', 'll', 'lo']

Or even a functional approach with map():

>>> list(map(lambda x, y: x + y, s, s[1:]))
['he', 'el', 'll', 'lo']

If you want a way to handle any number of adjacent characters, you could try using a sliding window approach, which takes the first n characters, and pops the first character, and repeats this until no more substrings can be taken.

Here is an example:

from collections import deque
from itertools import islice

def every_n(s, n):
    result = []

    items = deque(s)
    while len(items) >= n:
        result.append(''.join(islice(items, 0, n)))
        items.popleft()

    return result

Which works as follows:

>>> print(every_n('hello', 2))
['he', 'el', 'll', 'lo']
>>> print(every_n('hello', 3))
['hel', 'ell', 'llo']
>>> print(every_n('hello', 4))
['hell', 'ello']
>>> print(every_n('hello', 5))
['hello']

Python Regular expression: Find adjacent characters

2 Answers2

Original Answer: