1

I have an async method which calls a mapper for turning HTML string into an IEnumerable:

public async Task<IEnumerable<MovieRatingScrape>> GetMovieRatingsAsync(string username, int page)
{
    var response = await _httpClient.GetAsync($"/betyg/{username}?p={page}");
    response.EnsureSuccessStatusCode();
    var html = await response.Content.ReadAsStringAsync();
    return new MovieRatingsHtmlMapper().Map(html);
}

...

public class MovieRatingsHtmlMapper : HtmlMapperBase<IEnumerable<MovieRatingScrape>>
{
    // In reality, this method belongs to base class with signature T Map(string html)
    public IEnumerable<MovieRatingScrape> Map(string html)
    {
        var htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(html);
        return Map(htmlDocument);
    }

    public override IEnumerable<MovieRatingScrape> Map(HtmlDocument item)
    {
        var movieRatings = new List<MovieRatingScrape>();
        var nodes = item.DocumentNode.SelectNodes("//table[@class='list']/tr");

        foreach (var node in nodes)
        {
            var title = node.SelectSingleNode("//td[1]/a")?.InnerText;

            movieRatings.Add(new MovieRatingScrape
            {
                Date = DateTime.Parse(node.SelectSingleNode("//td[2]")?.InnerText),
                Slug = node.SelectSingleNode("//td[1]/a[starts-with(@href, '/film/')]")?
                    .GetAttributeValue("href", null)?
                    .Replace("/film/", string.Empty),
                SwedishTitle = title,
                Rating = node.SelectNodes($"//td[3]/i[{XPathHasClass("fa-star")}]").Count
            });
        }

        return movieRatings;
    }
}

The resulting list movieRatings contains copies of the same object, but when I look at the HTML and when I debug and view the HtmlNode node they differ as they are supposed to.

Either I'm blind to something really obvious, or I am hitting some async issue which I do not grasp. Any ideas? I should be getting 50 unique objects out of this call, now I am only getting the first 50 times.

Thank you in advance, Viktor.

Edit: Adding some screenshots to show my predicament. Look at locals InnerHtml (node) and title for item 1 and 2 of the foreach loop.

Edit 2: Managed to reproduce on .NET Fiddle: https://dotnetfiddle.net/A2I4CQ

enter image description here enter image description here

Viktor
  • 487
  • 2
  • 8
  • 26
  • 1
    When you manually step through the foreach loop, does the `movieRatings` list update correctly? – Stuart Aitken Oct 21 '19 at 02:18
  • 1
    `public T Map(string html)` I don't see the definition of type `T`. – Theodor Zoulias Oct 21 '19 at 02:31
  • @StuartAitken No, it does not. `node` seems to be correct, but `title` and `movieRatings` get the wrong data. – Viktor Oct 21 '19 at 10:50
  • @TheodorZoulias I'm sorry, I made a simplification for the purpose of the question, see updated code. – Viktor Oct 21 '19 at 10:51
  • I can't see anything obvious. My advice is to grab a logging framework, and log to a file at various points inside your code, until you find the source of duplication. – Theodor Zoulias Oct 21 '19 at 15:43
  • @Viktor Try LINQ. `movieRatings = nodes.Select(node => new MovieRatingScrape{ /*Date=node.selectNodes.... , Slug = , Title = ... etc... */ }).ToList();` This won't necessarily work, but it's an alternative route to the same result. Would be interesting to see what happens. – Stuart Aitken Oct 22 '19 at 02:59
  • @StuartAitken LINQ is producing the same result with `.Select()`. This is really odd and I don't know how to proceed without debugging HTML Agility Pack. – Viktor Oct 22 '19 at 22:28
  • @Viktor I guess it's time to ask the HTML Agility folk! https://github.com/zzzprojects/html-agility-pack/issues – Stuart Aitken Oct 22 '19 at 23:56
  • off topic! but you are returning an IEnumerable? what not use yield return MovieRatingScape, that way you can avoid waiting for the list to be built before returning, my suggestion – Aaron. S Oct 23 '19 at 04:48
  • @Aaron.S Yes, off topic! No, but I was yielding before I got to tearing my code apart looking for the culprit of this. – Viktor Oct 23 '19 at 07:50
  • @StuartAitken Created an issue now. I successfully created a [fiddle](https://dotnetfiddle.net/A2I4CQ), very thankful that it is not only on my machine. So either my code or HAP code. Probably my code somehow. – Viktor Oct 23 '19 at 10:04
  • Possible duplicate of [Html Agility Pack SelectSingleNode giving always same result in iteration?](https://stackoverflow.com/questions/15185404/html-agility-pack-selectsinglenode-giving-always-same-result-in-iteration) – Theodor Zoulias Oct 23 '19 at 14:52

2 Answers2

2

You need to use .// and not //

Here is the fixed Fiddle: https://dotnetfiddle.net/dZkSRN


// will search anywhere in the document

.// will search anywhere in the current node

Jonathan Magnan
  • 10,874
  • 2
  • 38
  • 60
  • Thanks, I will give you the correct answer! Then it all makes sense. I don't know how I missed that particular detail, I thought I read that `//` meant searching from current node down. – Viktor Oct 23 '19 at 16:12
-1

i am not super sure how to describe this but your issue is here (i think)

//table[@class='list']/tr"

specifically the //

I experienced the same thing while looking for a span. i had to use something similar

    var nodes = htmlDoc.DocumentNode.SelectNodes("//li[@class='itemRow productItemWrapper']");
            foreach(HtmlNode node in nodes)
            {
                var nodeDoc = new HtmlDocument();
                nodeDoc.LoadHtml(node.InnerHtml);

string name = nodeDoc.DocumentNode.SelectSingleNode("//span[@class='productDetailTitle']").InnerText;
            }
Aaron. S
  • 467
  • 4
  • 12