i am working on a regex match function in python. i have the following code:
def src_match(line, img):
imgmatch = re.search(r'<img src="(?P<img>.*?)"', line)
if imgmatch and imgmatch.groupdict()['img'] == img:
print 'the match was:', imgmatch.groupdict()['img']
the above does not seem to operate correctly for me at all. i do on the other hand have luck with this:
def href_match(line, url):
hrefmatch = re.search(r'<a href="(?P<url>.*?)"', line)
if hrefmatch and hrefmatch.groupdict()['url'] == url:
print 'the match was:', hrefmatch.groupdict()['url']
else:
return None
can someone please explain why this would be (or if maybe it seems like both should work)? for ex., is there something special about the identifier in the href_match() function? it can be assumed in both functions that i am passing both a line in that contains the string i am searching for, and the string itself.
EDIT: i should mention that i am sure i will never get a tag like:
<img width="200px" src="somefile.jpg">
the reason for this is that i am using a specific program which is generating the html and it will never yield a tag as such. this example should be taken as purely theoretical within the assumptions that i am always going to get a tag like:
<img src="somefile.jpg">
EDIT:
here is an example of a line that i am feeding to the function which does not match the input argument:
<p class="p1"><img src="myfile.anotherword.png" alt="beat-divisions.tiff"></p>