I am parsing emails using a regex in a c# VSTO project. Once in a while, the regex does not seem to work (although if I paste the text and regex in regexbuddy, the regex correctly matches the text). If I look at the email in gmail, I see
=E2=80=8B
at the beginning and end of some lines (which I understand is the UTF8 zero width space); this appears to be what is messing up the regex. This seems to be only sequence showing up.
What is the easiest way to get rid of this exact sequence? I cannot do the obvious
MailItem.Body.Replace("=E2=80=8B", "")
because those characters don't show up in the c# string.
I also tried
byte[] bytes = Encoding.Default.GetBytes(MailItem.TextBody);
string myString = Encoding.UTF8.GetString(bytes);
But the zero-width spaces just show up as ?. I suppose I could go through the bytes array and remove the bytes comprising the zero width space, but I don't know what the bytes would look like (it does not seem as simple as converting E2 80 8B to decimal and searching for that).