1

I fetched email content via GMail Api and converted its Base64 Url encoded string to byte[] and converted to readable html string. My goal is to convert the html string to a parsable string without \r \n \"

  var threadText = System.Text.Encoding.Default.GetString(threadBytes);
  threadText = threadText.Replace("\r", string.Empty);
  threadText = threadText.Replace("\n", string.Empty);
  threadText = threadText.Replace("\\\"", "\"");

I have had a look at the value of threadText from GetString()

<html xmlns=\"http://www.w3.org/1999/xhtml\">\r\n<head>\r\n<meta content=\"width=device-width, initial-scale=1.0\" name=\"viewport\"/>\r\n

the replacing \r \n all work, but the \" doesn't work.

String manipulation is the last thing I would like to do.

Do we have some decent methods to strip these \r \n \" ?

Franva
  • 6,565
  • 23
  • 79
  • 144
  • 4
    I think you want `threadText = threadText.Replace("\"", "'")` – Sean Mar 13 '20 at 14:12
  • 1
    This would replace " not \, wouldn't it? I think OP wants to replace the combination of of the two if I am reading it correctly. – Tyler Hundley Mar 13 '20 at 14:14
  • Are you sure this isn't just the debugger representation you are seeing / copied from Quick View? In any case, carriage return and linefeed should keep you from parsing otherwise valid html. – Filburt Mar 13 '20 at 14:15
  • hi @TylerHundley that's correct. – Franva Mar 13 '20 at 14:16
  • @Sean I want to replace not only ", but also \. – Franva Mar 13 '20 at 14:16
  • @Filburt how can I verify it? I mouse hovered on the theadText and I can still see them. – Franva Mar 13 '20 at 14:17
  • 6
    Your debugger is inserting the escape characters. They aren't part of the actual string. –  Mar 13 '20 at 14:19
  • @RenéVogt "\\"" throws an error. the 1st \ is used to skip \, the 3rd \ is used to skip " – Franva Mar 13 '20 at 14:19
  • @Amy how can I verify it? I mouse hovered on the theadText and I can still see \r \n \". – Franva Mar 13 '20 at 14:19
  • 3
    Click the hourglass symbol next to it in your debugger to use the string visualizer. Refer to the debugger documentation for screenshots if needed. –  Mar 13 '20 at 14:21
  • A good HTML Parser will parse HTML with `\r\n` included. Do not attempt to [Parse HTML using Regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Thomas Weller Mar 13 '20 at 14:24
  • thanks @Amy I use the Text Visualizer and can see it does not have \r \n \". So I hope those extra things are inserted by debugger and will not affect my HTML parsing. – Franva Mar 13 '20 at 14:26
  • @Franva Again, those won't appear in the Text Visualizer because it is *visualizing* the newline characters. You see newlines. You won't see *escaped* newlines. The newline characters `\r\n` *are* in your string. Do you understand? –  Mar 13 '20 at 14:27
  • @ThomasWeller yep, I agree. I do not like the string manipulation. – Franva Mar 13 '20 at 14:27
  • @Amy yep, I understood it. thanks – Franva Mar 13 '20 at 14:28
  • hi @amy you are correct. I think we should have a answer for this question for the people who have the same issue. If you would like to post a answer, I will accept. – Franva Jul 22 '20 at 12:51

0 Answers0