How to get all substrings between some delimiters in python

Question

I am trying to get all the substring that matches some delimiters. My issue is that i also need the character at the end of the last occurrence. The strings need to be between any of these characters: . , / , ? , = , - , _

I have tried this regular expression

pattern = re.compile(r"""[./?=\-_][^./?=\-_]+[./?=\-_]""")

In this exemple:

-facebook=chat.messenger?

I am not able to get the substring =chat.

I am only getting -facebook= and .messenger?

Dupe of [Python regex find all overlapping matches?](https://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches) — Wiktor Stribiżew, Jun 21 '19 at 21:43

cisco · Answer 1 · 2019-06-21T23:55:02.477

1

Looks like the overlap is what's causing some the drama. If using the regex module (which is expected to eventually replace the re module), you can do

import regex as re

delimiters = r'[./?=\-_]'
pattern = delimiters + r'[a-z]+' + delimiters
s = '-facebook=chat.messenger?'

print(regex.findall(pattern, s, overlapped=True))
# ['-facebook=', '=chat.', '.messenger?']

Notice that this assumes all characters are lowercase with [a-z], and that [./?=\-_] is the list of delimiters you specified.

Hope this helps!

edited Jun 21 '19 at 23:55

answered Jun 21 '19 at 23:20

cisco

802
9
12

1

hey thanks for your answer but i didn't want to install a library. This is the regex that i actually used (?=[/?=_–.-]([a-z]+)[/?=_–.-]) – Jun 22 '19 at 00:18

Emma · Answer 2 · 2019-06-21T22:41:43.177

0

My guess is that this expression might be what we might want to start with:

((?:[/?=_–.-])([a-z]+)(?:[/?=_–.-]))|([a-z]+)

Demo

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"((?:[/?=_–.-])([a-z]+)(?:[/?=_–.-]))|([a-z]+)"

test_str = "-facebook=chat.messenger?"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

edited Jun 21 '19 at 22:41

answered Jun 21 '19 at 21:46

Emma

27,428
11
44
69

1

Hey i am now getting it. But is it possible to get the facebook, chat and messenger and not only chat – Jun 21 '19 at 21:53
1

It works for this example, but the special characters needs to be one of these: , / , ? , = , - , _ and the number of special characters is indefinite – Jun 21 '19 at 22:08
1

yes but if I change to -facebook=chat.messenger?sdsdsdsds- it wont get the last substring. Sorry for not being clear :p – Jun 21 '19 at 22:12
1

Hey its updated, but there could be lots of other inputs. There could be 1000 special characters and it should get all of the substring between. The number of substring is indefinite – Jun 21 '19 at 22:23
1

It needs to get all of substrings between these characters: . , / , ? , = , - , _ but we do not now how many of substrings it could have. Right now, the answer you are giving me only works for when there are 3 substring – Jun 21 '19 at 22:27
1

Noo because it is matching string that are not between the possible delimiters – Jun 21 '19 at 22:32
1

Its okayyy, i made it work with this regex: (?=[/?=_–.-]([a-z]+)[/?=_–.-]) – Jun 21 '19 at 22:37

How to get all substrings between some delimiters in python

2 Answers2

Demo

Test