1

I am trying to get the image source from an article.

import requests
from bs4 import BeautifulSoup

url = "http://www.thehindu.com/entertainment/movies/why-does-kollywood-lack-financial-transparency/article23432094.ece"
res = requests.get(url)
soup = BeautifulSoup(res.content, 'html.parser')
body = soup.find("body")
imageparentobject = body.find("div", class_="lead-img-cont")
image = imageparentobject.find("img", "lead-img")
print image['src']

This is the output for the above code:

http://www.thehindu.com/static/img/1x1_spacer.gif

This is the image element.

<img src="http://www.thehindu.com/entertainment/movies/article23432092.ece/
alternates/FREE_660/04mp-trade2jpg" data-variant="FREE" data-device-variant="FREE~
FREE~FREE" data-src-template="http://www.thehindu.com/entertainment/movies/article
23432092.ece/BINARY/thumbnail/04mp-trade2jpg" data-proxy-image="http://www.thehind
u.com/entertainment/movies/article23432092.ece/ALTERNATES/FREE_215/04mp-trade2jpg"
data-proxy-width="" style="width:100%;" alt="Why does Kollywood lack financial tra
nsparency?" title="Why does Kollywood lack financial transparency?" class="media-o
bject adaptive placeholder lead-img">

This is the source I need:

http://www.thehindu.com/entertainment/movies/article23432092.ece/alternates/FREE_660/04mp-trade2jpg

and not the one in 'data-src-template'

  • 4
    The data you want is in the `data-src-template` attribute, not the `src`. – Daniel Roseman Apr 09 '18 at 09:30
  • Thanks for the reply. I want the data in src . The img element changes when I use .find. Is there a better way to do it? – Sachin Shetti Apr 09 '18 at 09:40
  • 1
    The data is *not in* src. It doesn't "change" when you use find. src has the spacer gif. Presumably the site is using some kind of JS framework to render the page dynamically; but from the point of view of the HTML, the data is in the attribute i mentioned. – Daniel Roseman Apr 09 '18 at 09:42
  • 2
    I agree with Daniel Roseman as the page is rendered with JavaScript the img element src attribute is not available in the raw source. if you want to render the page with JavaScript see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/45259523#45259523 or try https://html.python-requests.org/ – Dan-Dev Apr 09 '18 at 09:43

0 Answers0