0

Is it possible to set custom encoding when loading pages with the method below?

HtmlWeb hwWeb = new HtmlWeb();
HtmlDocument hd = hwWeb.load("myurl");

I want to set encoding to "iso-8859-9".

I use C# 4.0 and WPF.

Edit: The question has been answered on MSDN.

Mason Wan
  • 118
  • 1
  • 8
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342

3 Answers3

5

I suppose you could try overriding the encoding in the HtmlWeb object.

Try this:

var web = new HtmlWeb
{
    AutoDetectEncoding = false,
    OverrideEncoding = myEncoding,
};
var doc = web.Load(myUrl);

Note: It appears that the OverrideEncoding property was added to HTML agility pack in revision 76610 so it is not available in the current release v1.4 (66017). The next best thing to do would be to read the page manually with the encodings overridden.

Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • htmlagilitypack does not recognize OverrideEncoding – Furkan Gözükara Oct 25 '11 at 01:27
  • Ah, sorry. It looks like the `OverrideEncoding` property of `HtmlWeb` is new and not in v1.4. I was using the latest version as reference ([source](http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/91044#1336968)). I suppose the best option right now would be to load the page manually. – Jeff Mercado Oct 25 '11 at 01:30
  • i also would like to use latest. where can i download – Furkan Gözükara Oct 25 '11 at 01:45
  • You would have to download the [latest source](http://htmlagilitypack.codeplex.com/SourceControl/changeset/changes/91044#) and compile it. It doesn't seem like they've released any precompiled versions that has this change. I might be able to do that for you but am having internet connection problems as the moment so I'd be of little help. Also, goes without saying but take care in using it as it might not be release-ready. – Jeff Mercado Oct 25 '11 at 01:53
  • I guess you could just open the solution in visual studio and hit compile. If it doesn't work, I'll have to get back to you on that, when my connection stabilizes and I'm able to verify what you would need to do to get it working. – Jeff Mercado Oct 25 '11 at 02:14
  • yes i did that but still does not recognize it. very interesting. i did 1.4.0_beta2 – Furkan Gözükara Oct 25 '11 at 02:38
  • No you don't want the beta build, you'll want the source that I linked to above ([link](http://htmlagilitypack.codeplex.com/SourceControl/changeset/changes/91044#)) and compile that. If you're still having problems, I'll try to compile it myself and upload somewhere. – Jeff Mercado Oct 25 '11 at 23:28
  • Yes, the version found in the trunk is what you'll want to compile (make sure it is the "Release" version). I just did this and uploaded it [here](http://www.mediafire.com/?293fzu4v3ib367u). You could try out this version if you want or compile it yourself. – Jeff Mercado Oct 26 '11 at 00:05
3
var document = new HtmlDocument();

using (var client = new WebClient())
{
    using (var stream = client.OpenRead(url))
    {
        var reader = new StreamReader(stream, Encoding.GetEncoding("iso-8859-9"));
        var html = reader.ReadToEnd();
        document.LoadHtml(html);
    }
}

This is a simple version of the solution answered here (for some reasons it got deleted)

Mason Wan
  • 118
  • 1
  • 8
0

A decent answer is over here which handles auto-detecting the encoding as well as some other nifty features:

C# and HtmlAgilityPack encoding problem

Community
  • 1
  • 1
Eric
  • 2,273
  • 2
  • 29
  • 44