I am creating a program that process text with XML formatting. I found that when the tag values are non-ASCII quotes (double quotes / ASCII 34, single quote / ASCII 39) the parsing throws exception. Such quotes may come from editing software such as Ms Word (automatic formatting).
Currently I parses each line of my text box and replace the quotes before processing the XML. Here is the code (in C#)
int nLines = textBox1.Lines.Length;
for (int i = 0; i < nLines; i++)
{
// get the current line and replace quotes with standard ones
line = Regex.Replace(textBox1.Lines[i], "[\u2018|\u2019|\u201A]", "'");
line = Regex.Replace(line, "[\u201C|\u201D|\u201E]", "\"");
I wonder if there is a better / more correct / faster way to achieve this? What I mean by a more correct way is such the method shall covers almost all possibilities of quotes (I heard that \d can be used for 0-9 as well as unicode). Thanks in advance!