I'm trying to download an html document from Amazon but for some reason I get a bad encoded string like "��K��g��g�e".
Here's the code I tried:
using (var webClient = new System.Net.WebClient())
{
var url = "https://www.amazon.com/dp/B07H256MBK/";
webClient.Encoding = Encoding.UTF8;
var result = webClient.DownloadString(url);
}
Same thing happens when using HttpClient:
var url = "https://www.amazon.com/dp/B07H256MBK/";
var httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(url);
I also tried reading the result in Bytes and then convert it back to UTF-8 but I still get the same result. Also note that this DOES NOT always happen. For example, yesterday I was running this code for ~2 hours and I was getting a correctly encoded HTML document. However today I always get a bad encoded result. It happens every other day so it's not a one time thing.
==================================================================
However when I use the HtmlAgilitypack's wrapper it works as expected everytime:
var url = "https://www.amazon.com/dp/B07H256MBK/";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);
What causes the WebClient and HttpClient to get a bad encoded string even when I explicitly define the correct encoding? And how does the HtmlAgilityPack's wrapper works by default?
Thanks for any help!