How do I return a string from a regex match in python?

Question

I am running through lines in a text file using a python script. I want to search for an img tag within the text document and return the tag as text.

When I run the regex re.match(line) it returns a _sre.SRE_MATCH object. How do I get it to return a string?

import sys
import string
import re

f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')

count = 1

for line in f:
    line = line.rstrip()
    imgtag  = re.match(r'<img.*?>',line)
    print("yo it's a {}".format(imgtag))

When run it prints:

yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None

wflynny · Accepted Answer · 2013-08-28T16:49:49.977

145

You should use re.MatchObject.group(0). Like

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

You also might be better off doing something like

imgtag  = re.match(r'<img.*?>',line)
if imtag:
    print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

edited Aug 28 '13 at 16:49

answered Aug 28 '13 at 16:44

wflynny

18,065
5
46
67

See http://docs.python.org/2/library/re.html#match-objects – stalepretzel Aug 28 '13 at 16:44
I tried the code as shown above, but got the return value of None. If I changed the ' method to 'search' instead of 'match' I got the expected result. Not sure why this is...? – Bernard Esterhuyse Nov 18 '20 at 10:34
[Match is anchored to the start of the line](https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match). – wflynny Nov 18 '20 at 17:43
`imgtag.group()` without indexes also works – guesswho Aug 12 '21 at 04:57

score 11 · Answer 2 · answered Aug 28 '13 at 16:45

11

imgtag.group(0) or imgtag.group(). This returns the entire match as a string. You are not capturing anything else either.

http://docs.python.org/release/2.5.2/lib/match-objects.html

answered Aug 28 '13 at 16:45

Explosion Pills

188,624
52
326
405

score 10 · Answer 3 · answered Apr 24 '17 at 08:09

Note that re.match(pattern, string, flags=0) only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0) instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0) as the folks suggested.

score 8 · Answer 4 · answered Aug 28 '13 at 17:01

Considering there might be several img tags I would recommend re.findall:

import re

with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
    for line in f_in:
        for img in re.findall('<img[^>]+>', line):
            print >> f_out, "yo it's a {}".format(img)

How do I return a string from a regex match in python?

4 Answers4

Linked

Related