0

In my mvc web application I am trying to parse a html document. It seems to work fine but the only issue is that it gives me special charters and not parse characters like æ,å,ø etc correctly.

Here is my code

var html = new HtmlDocument();
html.LoadHtml(new WebClient().DownloadString("http://cricketforbundet.no/index.php/en/klubber"));
var root = html.DocumentNode;
var p = root.Descendants("table").FirstOrDefault().Descendants("tr").Skip(1).FirstOrDefault().ChildNodes.Where(i=>i.Name == "td").FirstOrDefault().InnerText;

I get Bjørvika Cricket Klubb in p where as I should get Bjørvika Cricket Klubb.

Any thoughts? I am using HtmlAgilityPack to parse HTML in ASP.NET

mohsinali1317
  • 4,255
  • 9
  • 46
  • 85

1 Answers1

1

You have to use load instead of LoadHtml and make sure use UTF8 encoding

        WebClient webClient = new WebClient();
        HtmlDocument html = new HtmlDocument();
        html.Load(webClient.OpenRead("http://cricketforbundet.no/index.php/en/klubber"), Encoding.UTF8);
        var root = html.DocumentNode;
        var p = root.Descendants("table").FirstOrDefault().Descendants("tr").Skip(1).FirstOrDefault().ChildNodes.Where(i => i.Name == "td").FirstOrDefault().InnerText;

check this answer

Community
  • 1
  • 1
Abdul Hadi
  • 1,229
  • 1
  • 11
  • 20