0

I am trying to increase the speed of my webscraping app by making my for each loop a parallel for each.

        public List<MovieTVInformation> ViewMovies()
    {
        List<MovieTVInformation> AllFoundMovies = new List<MovieTVInformation>(100);
        HtmlWeb website = new HtmlWeb();
        HtmlDocument doc = website.Load("http://www.imdb.com/chart/moviemeter");

        var MovieNames = doc.DocumentNode.SelectNodes("//*[@id='main']/div/span/div/div/div[3]/table/tbody/tr/td[2]").ToList();
        var ImageLocation = doc.DocumentNode.SelectNodes("//*[@id='main']/div/span/div/div/div[3]/table/tbody/tr/td[1]/a").ToList();
        var IMDBLinks = doc.DocumentNode.SelectNodes("//*[@id='main']/div/span/div/div/div[3]/table/tbody/tr/td[2]/a").ToList();

        Parallel.ForEach(MovieNames, (name, state, index) =>
        {
            if (index > 0 && index < 99)
            {

                AllFoundMovies.Add(new MovieTVInformation());

                var TempName = name.InnerText;
                TempName = AdjustName(TempName, Convert.ToInt32(index));
                AllFoundMovies[Convert.ToInt32(index)].Name = TempName;
            }
        });

        return AllFoundMovies;
}

My issue is that the index goes out of bounds every time giving a negative error index error. I have added an IF statement to see if that would stop it going out of bounds (as the list only has 100 entries) and still no success.

Could anyone let me know what i could be doing wrong,

thanks

user7195486
  • 33
  • 1
  • 1
  • 8
  • 2
    List of things you are doing wrong: not reading documentation of `List` constructor, converting `int` to `int` using `Convert.ToInt32`, not debugging your code (looking at `AllFoundMovies.Count`could have helped), and finally not coming up with [MCVE] - there is aboslutely nothing related to `Parallel.ForEach` here... Please carefully re-read linked standard duplicate (that you've obviously already seen while researching this question) and then [edit] post if you still feel this is new question. – Alexei Levenkov Mar 04 '18 at 19:20
  • 1
    I am pretty sure even if you do not use `Parallel.ForEach`, you will get the same exception. – CodingYoshi Mar 04 '18 at 19:20
  • it works fine with a normal for each – user7195486 Mar 04 '18 at 19:28
  • You are only initializing the AllFoundMovies List with a capacity. There are zero elements in it at that time. Inside your ForEach you are just adding a new element to it...so first thread gets index 0... Initialize the collection by populating 100 new MovieTVInformation objects before your Parallel.ForEach and remove the AllFoundMovies.Add(new MovieTVInformation) line. – Mufaka Mar 04 '18 at 20:00
  • Thank you :) most people here just shit on noobs but your help was perfect, i see what i did wrong and it works now, thanks again Mufaka – user7195486 Mar 04 '18 at 20:15
  • You are welcome! You've done well to get something working. Sure, there are some issues but you clearly asked a legitimate question that inappropriately got marked as a duplicate by someone who has forgotten the reason for the site. The question is 50% Parallel.ForEach and 50% List initialization confusion. There may be a duplicate question out there but definitely not the one that is linked to IMO. – Mufaka Mar 05 '18 at 02:38

0 Answers0