Please excuse this lengthy question. I've written a C# App that uses WebClient.DownloadFileAsync to pull down and save a file to a client computer.
This works successfully for a pdf file, whose Internet folder location doesn't change. However, I'm also trying to download some audio files with a .mp3.zip extension.
If I input the URL for these files, I'm taken directly to the file download site where I'm presented with a dialog to either select individual files or click a link to "Download All Files".
I want to programmatically download the entire .mp3.zip file.
The problem with the "Download All Files" link is that, it appears to include a random folder naming scheme in its URL. For example, http://download.site.org/files/audio_books/xx/zipfile.mp3.zip; the xx being a changing folder location.
If the URL for the audio files always had the same exact location, I could use WebClient.DownloadFileAsync without a problem. I'm able to manually read the Outer HTML if I inspect the element for the link, but I've observed that this (xx) changes monthly.
If I could find a way to successfully parse the URL in the Download link, I could verify what the current (xx) folder name is and then use WebClient normally.
I've been all over the Internet and read through numerous StackOverFlow articles, for example Grabbing just the URL of an href using HTMLAgilityPack, and Image scraper with C#, but none of the suggestions appear to return the (xx) folder name contained in the Outer HTML.
I came across another post on SOF, which appears to be the closest answer to my question, i.e. Parse inner HTML
This is what I've tried, but it throws a NullReferenceException.
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(url);
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US))";
string source;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
source = reader.ReadToEnd();
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(source);
string hrefValue = doc.DocumentNode
.Descendants("div")
.Where(x => x.Attributes["class"].Value == "flRight")
.Select(x => x.Element("a").Attributes["href"].Value)
.FirstOrDefault();
Can anyone suggest why the where clause querying the class.value is throwing the exception, or what is needed? I feel I'm really close to solving this issue, because if I inspect the element of the download button, I can see what I need in a div class.
P.S. is the only way to ask additional questions to edit my original post, or the limited text comment box?