I am developing an application that needs to get the images on the first page of a google image search. I have already figured out how to scrape the HTML on a google search query, and how to open an URL and get the bytes of the photo and save it as an Image object so I can display it on a Windows Form and save it to PC.
But since I am not that good at HTML parsing, finding objects in HTML, and HTML in general, I would like a method in which I would feed the HTML of the page, and it would return a list of strings of URL's of images in the HTML. I would like the full res photo URL, but for now anything would do.
I have tried this solution but if I try the top answer to that solution, ndx is -1. As far as my knowledge goes, I'm guessing this is because Google edited their HTML and removed/renamed/changed the implementation of the images_table class?
This is the code of the answer linked above-:
private List<string> GetUrls(string html)
{
var urls = new List<string>();
int ndx = html.IndexOf("class=\"images_table\"", StringComparison.Ordinal);
ndx = html.IndexOf("<img", ndx, StringComparison.Ordinal);
while (ndx >= 0)
{
ndx = html.IndexOf("src=\"", ndx, StringComparison.Ordinal);
ndx = ndx + 5;
int ndx2 = html.IndexOf("\"", ndx, StringComparison.Ordinal);
string url = html.Substring(ndx, ndx2 - ndx);
urls.Add(url);
ndx = html.IndexOf("<img", ndx, StringComparison.Ordinal);
}
return urls;
}
How can I re implement this method so that it works as intended? I am using C#. If there is anything wrong that I did with the question or formatting, any information I need to provide please tell me as I am new to programming and StackOverflow. You can also suggest another website or API (free) I can use to get images from the web. Thanks in advance.