0

I implement some IMDb scraper but my scraper gets the wrong image src value. Images are not loaded immediately and the value in src is some default image, it is not the picture of the movie because the page isn't loaded yet. How can I get the correct value from the src tag?

This is my code:

    private IDocument GetMovies(int number)
    {
        var document = _context
            .OpenAsync($"https://www.imdb.com/search/title/?groups=top_1000&sort=user_rating,desc&count={number}&start=201&ref_=adv_nxt").GetAwaiter().GetResult();

        return document;
    }

Here I get the values of the images:

var images = document.QuerySelectorAll("img").Select(x => x.GetAttribute("src"));
  • 1
    Well, you have reached the limit of the `HttpRequest` for further scraping operations you need something that can render DOM. In javascript, you have options to use libraries like **jsdom** (which is not a headless browser) but have no idea for c#. For pure C# solutions you can check this so [answer](https://stackoverflow.com/questions/10886161/load-a-dom-and-execute-javascript-server-side-with-net). Or you can use .net versions of libraries like **playwright**, **puppeteer** etc. – Eldar Sep 04 '22 at 08:14
  • 2
    Depending on how much you plan to scrape, you should consider using [IMDb's official API](https://imdb-api.com/API). And always check the [robots.txt](https://www.imdb.com/robots.txt) – Xerillio Sep 04 '22 at 09:04

0 Answers0