I have a string that looks like:
4000 BCE–5000 BCE and 600 CE–650 CE
.
I am trying to use a regex to search through the string, find all character codes and replace all character codes with the corresponding actual characters. For my sample string, I want to end up with a string that looks like
4000 BCE–5000 BCE and 600 CE–650 CE
.
I tried writing it in code, but I can't figure out what to write:
string line = "4000 BCE–5000 BCE and 600 CE–650 CE";
listof?datatype matches = search through `line` and find all the matches to "&#.*?;"
foreach (?datatype match in matches){
int extractedNumber = Convert.ToInt32(Regex.(/*extract the number that is between the &# and the ?*/));
//convert the number to ascii character
string actualCharacter = (char) extractedNumber + "";
//replace character code in original line
line = Regex.Replace(line, match, actualCharacter);
}
Edit
My original string actually has some HTML in it and looks like:
4000 <small>BCE</small>–5000 <small>BCE</small> and 600 <small>CE</small>–650 <small>CE</small>
I used line = Regex.Replace(note, "<.*?>", string.Empty);
to remove the <small>
tags, but apparently, according to one of the most popular questions on SO, RegEx match open tags except XHTML self-contained tags, you really should not use RegEx to remove HTML.