0

I am using lxml for the first time and I am running into some issues when I try to scrape data. When I view the part of the code I want to scrape using developer tools, it looks like this. The webapge in question is https://nba.com/hawks/photos

<ul id="nbaImageGrid">
    <li class="grid-item">
        <a href="https://...">
            <img src="...">
        </a>
    </li>
    <li ...>
    </li>
</ul>

When I view the page source, the Html looks like this.

<ul id="nbaImageGrid" class="loading"></ul>

When I tried using lxml to get the a href from each grid-item I get an empty list returned

doc.xpath("//li[@class='grid-item']/a/@href")

I understand that this webpage is not static and that it is either client-side or server-side generated. Is it still possible to use lxml to scrape this page? If not, can you recommend another python scraping library that I could use to scrape pages like this? Additionally, it would be helpful if you could explain if this site is client-side or server-side, and how you can tell.

Thank You!

Chase
  • 33
  • 2
  • 11
  • 1
    I would recommend selenium for this kind of work: https://stackoverflow.com/questions/17361742/download-image-with-selenium-python – jufx Jan 04 '20 at 21:21
  • _Is it still possible to use lxml to scrape this page?_ Yes, although I think you're focusing on the part of the process. _If not, can you recommend another python scraping library that I could use to scrape pages like this?_ Explicitly off-topic, see [help/on-topic]. _Additionally, it would be helpful if you could explain if this site is client-side or server-side, and how you can tell._ What do you mean? Are you referring to server-side HTML rendering? – AMC Jan 04 '20 at 21:56
  • 2
    This isn't really a question about lxml. LXML doesn't fetch HTML from a remote site; you're obviously using something else (`urllib`, `requests`, etc) for that. Of course it's possible to use `lxml` to scrape data from a dynamic page, but you have to somehow fetch that data and feed it to `lxml`. – larsks Jan 04 '20 at 22:21

0 Answers0