0

I wrote code like this:

>>> import re
>>> url='<a href="C:\python34\koala.jpg">jpg</a><font size="10">'
>>> print(re.findall('href="(.*?)"><',url))

I except result

C:\python34\koala.jpg">jpg</a 

But I can see only this result :

[]

why is this happening?

I did not know why I have this result in console.
Please help me.
I am using python 3.4 and windows8.1.

LoicTheAztec
  • 229,944
  • 23
  • 356
  • 399
L.kyunam
  • 53
  • 12
  • Why would you expect it such when your capturing group is inside quotes? – AKS May 12 '16 at 06:38
  • Never use regex to parse html : http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not – Garf365 May 12 '16 at 07:37

1 Answers1

2

Are you sure you want the >jpg</a part too. If yes then you can use this:

>>> re.findall('href="([^"]*">[^<]*</a)',url)
['C:\\python34\\koala.jpg">jpg</a']

If you need only the href attributes value then you can use:

>>> re.findall('href="([^"]*")',url)
['C:\\python34\\koala.jpg"']
riteshtch
  • 8,629
  • 4
  • 25
  • 38