How to replace web codes with their equivalent symbols?

Question

This could be a duplicate question, but I have no idea what search terms to look up, so don't be hard on me if it has been asked before (and I'm pretty sure it was).

So I am getting a web page's source code using the WebClient class and saving the entire string in the source variable:

var client = new WebClient();
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
var data = client.OpenRead(urlAddress);
var reader = new StreamReader(data);
var source = reader.ReadToEnd();
data.Close();
reader.Close();

Now I want to process certain text ranges from the source variable, especially user posted messages. Now the problem is that in the web pages source "&" is actually &, "'" is ’ and quotes (") are either –, “, ” and who knows what else.

Well, I could replace those codes with the actual symbols using the Replace string method, but I would like to know if there is a way to convert all those codes to the actual (expected) symbols. Is there a method that can do that, or maybe a library or some utility class on the Internet?

The term describing what you are seeing is "HTML encoding": http://en.wikipedia.org/wiki/Character_encodings_in_HTML — Jesse Webb, Sep 11 '12 at 16:47
Thank you for the reference. Now I also learned that this thread could answer my question http://stackoverflow.com/questions/122641/how-can-i-decode-html-characters-in-c — IneedHelp, Sep 11 '12 at 17:06

Justin Niessner · Accepted Answer · 2012-09-11T17:01:32.767

4

Try using HttpUtility.HtmlDecode or HttpServerUtility.HtmlDecode.

edited Sep 11 '12 at 17:01

answered Sep 11 '12 at 16:45

Justin Niessner

242,243
40
408
536

And forget about using the ClientProfile. – H H Sep 11 '12 at 16:58
Yup, because of System.Web.dll. Thank you, mister Justin! – IneedHelp Sep 11 '12 at 17:00

How to replace web codes with their equivalent symbols?

1 Answers1