This could be a duplicate question, but I have no idea what search terms to look up, so don't be hard on me if it has been asked before (and I'm pretty sure it was).
So I am getting a web page's source code using the WebClient
class and saving the entire string in the source
variable:
var client = new WebClient();
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
var data = client.OpenRead(urlAddress);
var reader = new StreamReader(data);
var source = reader.ReadToEnd();
data.Close();
reader.Close();
Now I want to process certain text ranges from the source
variable, especially user posted messages. Now the problem is that in the web pages source "&" is actually &
, "'" is ’
and quotes (") are either –
, “
, ”
and who knows what else.
Well, I could replace those codes with the actual symbols using the Replace
string method, but I would like to know if there is a way to convert all those codes to the actual (expected) symbols. Is there a method that can do that, or maybe a library or some utility class on the Internet?