I am writing a spider to download all images on the front page of a subreddit using scrapy. To do so, I have to find the image links to download the images from and use a CSS or XPath selector.
Upon inspection, the links are provided but the HTML looks like this for all of them:
<div class="expando expando-uninitialized" style="display: none" data-cachedhtml=" <div class="media-preview" id="media-preview-7lp06p" style="max-width: 861px"> <div class="media-preview-content"> <a href="https://i.redd.it/29moua43so501.jpg" class="may-blank"> <img class="preview" src="https://i.redditmedia.com/Q-LKAeFelFa9wAdrnvuwCMyXLrs0ULUKMsJTXSf3y34.jpg?w=861&s=69085fb507bed30f1e4228e83e24b6b2" width="861" height="638"> </a> </div> </div> " data-pin-condition="function() {return this.style.display != 'none';}"><span class="error">loading...</span></div>
From what I can tell, it looks like all of the new elements are being initialized inside the opening tag of the <div>
element. Could you explain what exactly is going on here, and how one would go about extracting image information from this?
*Sorry, I'm not quite sure how to properly format the html code, but there really isn't all too much to format, as it is all one big tag anyway.