0

Based upon a previous written code snippet I'm now trying to store multiple images at once from a certain subreddit into a local directory. My Problem is that I can't get my LINQ statement working properly. I also don't want to download the thumbnail pictures which was why I took a look at the HTML-page and found out that the links I aim to retrieve are hidden in level 5 within the href attribute:

(...)
Level 1: <div class="content">...</div>
    Level 2: <div class="spacer">...</div>
        Level 3: <div class="siteTable">...</div>
            Level 4: <div class=" thing id-t3_6dj7qp odd  link ">...</div>                      
                Level 5: <a class="thumbnail may-blank outbound" href="href="http://i.imgur.com/jZ2ZAyk.jpg"">...</a>

That was my best bet in line '???':

.Where(link => Directory.GetParent(link).Equals(@"http://i.imgur.com"))

Sadly enough it throws out an error stating that

 Object reference not set to an instance of an object

Well now I know why it's not working but I've still got no clue how to rewrite this line since I'm still fairly new to Lambda Expressions. To be honest, I don't really know why I got a System.NullReferenceException in the first place but not in the next line. What's the difference? Maybe my approach on this problem isn't even good practice at all so please let me know how I could proceed further.

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Net;
using HtmlAgilityPack;

namespace GetAllImages
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> imageLinks = new List<string>();

            // Specify Directory manually
            string dirName = "Jessica Clements";
            string rootPath = @"C:\Users\Stefan\Desktop";
            string dirPath = Path.Combine(rootPath, dirName);

            // Specify the subReddit manually
            string subReddit = "r/Jessica_Clements";
            string url = @"https://www.reddit.com/" + subReddit;

            try
            {
                DirectoryInfo imageFolder = Directory.CreateDirectory(dirPath);                

                HtmlDocument document = new HtmlWeb().Load(url);
                imageLinks = document.DocumentNode.Descendants("a")
                            .Select(element => element.GetAttributeValue("href", null))
                            .Where(???) 
                            .Where(stringLink => !String.IsNullOrEmpty(stringLink))
                            .ToList();

                foreach(string link in imageLinks)
                {
                    using (WebClient _wc = new WebClient())
                    {
                        _wc.DownloadFileAsync(new Uri(link), Path.Combine(dirPath, Path.GetFileName(link)));
                    }                        
                 }

            Console.WriteLine($"Files successfully saved in '{Path.GetFileName(dirPath)}'.");             

            }

            catch(Exception e)
            {
                while(e != null)
                {
                    Console.WriteLine(e.Message);
                    e = e.InnerException;
                }
             }

            if(System.Diagnostics.Debugger.IsAttached)
            {
                Console.WriteLine("Press any key to continue . . .");
                Console.ReadKey(true);
            }
        }
    }
}

Edit: Just in case someone is interested in this solution that's how I made it work in the end using the answers below:

HtmlDocument document = new HtmlWeb().Load(url);
imageLinks = document.DocumentNode.Descendants("a")
            .Select(element => element.GetAttributeValue("href", null))
            .Where(link => (link?.Contains(@"http://i.imgur.com") == true))
            .Distinct()
            .ToList();
間澤東雲
  • 149
  • 3
  • 9
  • A much better approach to your problem is to use the [JSON API](https://www.reddit.com/r/Jessica_Clements/.json) instead of parsing HTML. – Nasreddine May 27 '17 at 21:28

2 Answers2

4

Given that this line throws the exception:

.Where(link => Directory.GetParent(link).Equals(@"http://i.imgur.com"))

I'd make sure that link is not null and that the result of GetParent(link) is not null either. So you could do:

.Where(link => link != null && (Directory.GetParent(link)?.Equals(@"http://i.imgur.com") ?? false))

Notice the null check and the ?. after GetParent(). This one stops the execution of the term if null is returned from GetParent(). It is called the Null Conditional Operator or "Elvis Operator" because it can be seen as two eyes with twirly hair. The ?? false gives the default value in case the execution was stopped because of a null value.

However, if you plan to parse HTML code you should definitely have a look at the Html Agility Pack (HAP).

Waescher
  • 5,361
  • 3
  • 34
  • 51
  • I've got a new idea: how about I swap both Where statements? That way wouldn't I ensure that link couldn't possibly be null? I'm mobile now so I can't check whether that's true or not. – 間澤東雲 May 27 '17 at 21:56
  • Yeah, perfectly right. Skip the null check then and go for the elvis operator. Alternatively, use `String.Equals(link, "...")`. By not using the variable `link` to call methods on, you cannot run into NullRefs – Waescher May 27 '17 at 21:56
1

if you are trying to get all links pointing to the http://i.imgur.com, you need something like this

    imageLinks = document.DocumentNode.Descendants("a")
                .Select(element => element.GetAttributeValue("href", null))
                .Where(link => link?.Contains(@"http://i.imgur.com") == true)
                .ToList();
makison
  • 373
  • 2
  • 10