Python, search for string occurrences included in < and >

Question

Consider a list of string. I want to find all the substring starting with < and ending with >.

How to do this?

I already tryed to transform a regex expression from this question : Regular expression to return text between parenthesis

But since I'm not familiar with regex expressions, none of my trials was successful.

Note 1: I'm not focused on Regex, any working solution is welcome.

Note 2: I'm not parsing HTML or any markup language

we are not here to solve your problems. Try for yourself and if you are stuck you can ask for help. Or use the search function, I'm sure there are a few similar problems. Keyword: Regex — Boendal, Dec 25 '19 at 23:02
You could maaaybe do it with regex, but if you're talking about structured text like HTML or XML you'll want a legit parser — Jared Smith, Dec 25 '19 at 23:02

score 3 · Answer 1 · answered Dec 25 '19 at 23:21

3

Using re.findall:

import re
matches = re.findall(r"<(.*?)>", s)

I find RegExr to be a great site for tinkering with regex.

answered Dec 25 '19 at 23:21

erik

141
3

Jack Taylor · Answer 2 · 2019-12-27T12:07:12.160

This should do what you are looking for.

import re

strings = ["x<first>x<second>x", "x<third>x"]
result = [substring for substring in re.findall(r"<.*?>", string) for string in strings]
print(result)

Here, re.findall finds all the matches in the substrings for the regular expression <.*?>. A list comprehension is used to loop over all the strings in the list, and all the matches in the strings.

By the way, why do you need to match angle brackets like this? If it's to parse HTML or XML, you would be better off using a dedicated parser, as writing your own regular expressions is error-prone, and regular expressions alone cannot deal with arbitrarily nested elements.

score 0 · Answer 3 · answered Dec 25 '19 at 23:17

You can do it with regex like this:

import re

regex = r"<([^>]*)>"

test_list = ["<hi how are you> I think <not anymore> whatever <amazing hi>", "second <first> <third>"]

for test_str in test_list:
    matches = re.finditer(regex, test_str, re.MULTILINE)

    for matchNum, match in enumerate(matches, start=1):

        print("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum=matchNum, start=match.start(),
                                                                        end=match.end(), match=match.group()))

Output:

Match 1 was found at 0-16: <hi how are you>
Match 2 was found at 25-38: <not anymore>
Match 3 was found at 48-60: <amazing hi>
Match 1 was found at 7-14: <first>
Match 2 was found at 15-22: <third>

if you want to remove "<" and ">" you can do a string replace.

but yeah if you have structured text like HTML or XML take a legit parser.

Python, search for string occurrences included in < and >

3 Answers3