How to use Python regular expression to get Image src?

Question

How to use regular expression to get src of image from the following html string using Python

<td width="80" align="center" valign="top"><a href="http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNFqz8ZCIf6NjgPPiTd2LIrByKYLWA&url=http://www.news.com.au/business/spain-victory-faces-market-test/story-fn7mjon9-1226390697278"><img src="//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg" alt="" border="1" width="80" height="80" /> NEWS.com.au</a></td>

I tried to use

matches = re.search('@src="([^"]+)"',text)
print(matches[0])

But got nothing

What is the '@' character supposed to match? There is no such character in your input string. — Martijn Pieters, Jun 10 '12 at 20:26
possible duplicate of [Python regular expression for HTML parsing (BeautifulSoup)](http://stackoverflow.com/q/55391/), [Python Reg Ex. problem](http://stackoverflow.com/q/90052/) — outis, Jun 12 '12 at 05:33

score 9 · Answer 1 · edited Jun 06 '17 at 03:06

9

Instead of regex, you could consider using BeautifulSoup:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(junk)
>>> soup.findAll('img')
[<img src="//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg" alt="" border="1" width="80" height="80" />]
>>> soup.findAll('img')[0]['src']
u'//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg'

edited Jun 06 '17 at 03:06

clopez

4,372
3
28
42

answered Jun 10 '12 at 20:33

fraxel

34,470
11
98
102

1

wouldn't Beautiful Soup add a lot of overhead to the solution? `img` tags are relatively easy to parse (and since they don't enclose other text, usually are formatted correctly) – Jeff Tratner Jun 11 '12 at 15:21

score 6 · Accepted Answer · edited Jan 21 '13 at 17:11

6

Just lose the @ in the regex and it will work

edited Jan 21 '13 at 17:11

xpda

15,585
8
51
82

answered Jun 10 '12 at 20:26

buckley

13,690
3
53
61

score -1 · Answer 3 · answered Jun 10 '12 at 20:30

-1

You could simplify your re a little:

match = re.search(r'src="(.*?)"', text)

answered Jun 10 '12 at 20:30

Joel Cornett

24,192
9
66
88

It gets javascript files too. – Bryan Dimas Oct 25 '15 at 19:18

How to use Python regular expression to get Image src?

3 Answers3