Using findall function in module re for python

Question

I wrote code like this:

>>> import re
>>> url='<a href="C:\python34\koala.jpg">jpg</a><font size="10">'
>>> print(re.findall('href="(.*?)"><',url))

I except result

C:\python34\koala.jpg">jpg</a

But I can see only this result :

[]

why is this happening?

I did not know why I have this result in console.
Please help me.
I am using python 3.4 and windows8.1.

Why would you expect it such when your capturing group is inside quotes? — AKS, May 12 '16 at 06:38
Never use regex to parse html : http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not — Garf365, May 12 '16 at 07:37

score 2 · Answer 1 · answered May 12 '16 at 06:39

Are you sure you want the >jpg</a part too. If yes then you can use this:

>>> re.findall('href="([^"]*">[^<]*</a)',url)
['C:\\python34\\koala.jpg">jpg</a']

If you need only the href attributes value then you can use:

>>> re.findall('href="([^"]*")',url)
['C:\\python34\\koala.jpg"']

1 Answers1