IMDb Scraping with C# and Anglesharp: can not scrape img tag properly

Question

I implement some IMDb scraper but my scraper gets the wrong image src value. Images are not loaded immediately and the value in src is some default image, it is not the picture of the movie because the page isn't loaded yet. How can I get the correct value from the src tag?

This is my code:

    private IDocument GetMovies(int number)
    {
        var document = _context
            .OpenAsync($"https://www.imdb.com/search/title/?groups=top_1000&sort=user_rating,desc&count={number}&start=201&ref_=adv_nxt").GetAwaiter().GetResult();

        return document;
    }

Here I get the values of the images:

var images = document.QuerySelectorAll("img").Select(x => x.GetAttribute("src"));

Well, you have reached the limit of the `HttpRequest` for further scraping operations you need something that can render DOM. In javascript, you have options to use libraries like **jsdom** (which is not a headless browser) but have no idea for c#. For pure C# solutions you can check this so [answer](https://stackoverflow.com/questions/10886161/load-a-dom-and-execute-javascript-server-side-with-net). Or you can use .net versions of libraries like **playwright**, **puppeteer** etc. — Eldar, Sep 04 '22 at 08:14
Depending on how much you plan to scrape, you should consider using [IMDb's official API](https://imdb-api.com/API). And always check the [robots.txt](https://www.imdb.com/robots.txt) — Xerillio, Sep 04 '22 at 09:04

IMDb Scraping with C# and Anglesharp: can not scrape img tag properly

0 Answers0