2

I am using HtmlAgilityPack to read a parse a html file and extract some text:

static void Main(string[] args)
        {
            var webGet = new HtmlWeb();
            var document = webGet.Load("http://port.ro/");

            var programs = from program in document.DocumentNode.Descendants()
                           where program.Name == "a" && program.Attributes["href"] != null && program.InnerText.Trim().Length > 0
                           select program.InnerText ;

            foreach (string s in programs)
            {
                Console.WriteLine(s);
            }

            Console.ReadLine();
        }

My problem is that the website contains characters like à and when I print them, they are replaced by ?.

What should I need to do so when I print the text the character à its replaced by a or print it like à ?

Adrian
  • 19,440
  • 34
  • 112
  • 219
  • possible duplicate of [c# unicode string output](http://stackoverflow.com/questions/5055659/c-sharp-unicode-string-output) – CodeCaster Nov 04 '11 at 08:29

2 Answers2

1

Did you try using or set the encoding as required for the site. This should help you get the proper text

var document = webGet.Load("http://port.ro/", Encoding.UTF8);//check your encoding

Above one is for htmldocument

for HtmlWeb Try this:

var web = new HtmlWeb
{
    AutoDetectEncoding = false,
    OverrideEncoding = myEncoding,
};
var doc = web.Load(myUrl);
Community
  • 1
  • 1
V4Vendetta
  • 37,194
  • 9
  • 78
  • 82
1

In HtmlAgility there is property to set stream encoding (normaly it should autodetect encoding ) but maybe not working for your page.. (wrong meta tags etc..)

Kamil Lach
  • 4,519
  • 2
  • 19
  • 20