0

My imported text contains several Unicode escape sequences

\u0092 \u0093 \u0094 \u0095 \u0096

Sample text:

string str = " Canadian Equity Funds may also invest in or use derivative instruments as described in “Investment Strategies – Use of Derivative Instruments" ";

Example c# text:

 may also invest in or use derivative instruments as described in \u0093Investment Strategies \u0096 Use of Derivative Instruments\u0094."

I tried using this

Regex rx = new Regex(@"\\[uU]([0-9A-F]{4})");
var newString = rx.Replace(input, match => 
((char)Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString());

I get exactly the same string.

I tried about every way I could to convert these to actual text, but to no avail. What am I supposed to do with them?

ZeroCool
  • 609
  • 7
  • 21
  • Impossible to answer without the string and your code. Post a *reproducible* example. Does your string really have spaces between the encodings for example? How was it produced? – Panagiotis Kanavos Mar 31 '17 at 14:32
  • I added sample text – ZeroCool Mar 31 '17 at 14:37
  • 1
    .NET works only with Unicode strings. There's no need to use any kind of encoding or sequences. Escape sequences in string literals are interpreted as actual Unicode characters. Either change whatever process generates that string to use Unicode, or use a regex to parse these sequences. – Panagiotis Kanavos Mar 31 '17 at 14:38
  • 1
    I have tried using every rejex I could find and in the end I just get the same string. – ZeroCool Mar 31 '17 at 14:39
  • a reproducible example is something people can copy into Visual Studio, run and reproduce the problem. You haven't provided any example of the string yet, eg : `var myString = "sdfsdfs";` Besides, .NET uses Unicode strings only. Where did this string come from? Why isnt' it an actual Unicode string? – Panagiotis Kanavos Mar 31 '17 at 14:39
  • *What* did you try? Where did these characters come from? These are *not* Unicode characters. Someone went and wrote `\`, `u`, `0`, `0`, `9` and `2` instead of `’`. Why? Why don't you fix *that* conversion? – Panagiotis Kanavos Mar 31 '17 at 14:42
  • Your sample text is perferctly OK. It doesn't contain any escape sequences. – Panagiotis Kanavos Mar 31 '17 at 14:42
  • I edited with an actual string. These values come from somewhere where I cannot change the input - this is the input I have the string str – ZeroCool Mar 31 '17 at 14:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/139604/discussion-between-zerocool-and-panagiotis-kanavos). – ZeroCool Mar 31 '17 at 14:43
  • When the text is loaded with an CSV reader, I get the text with \u00XX characters. On the front end, these characters are displayed as [], perhaps I dont understand the problem and it is the front end that needs a different encoding? – ZeroCool Mar 31 '17 at 14:47
  • No. As I said, .NET is Unicode. You get such problems if you try to apply *ANY* encoding because you think there's a problem. You were able to post such a Unicode string on a site created with ASP.NET - StackOverflow. Post a line of the CSV file. *How* do you load that file? – Panagiotis Kanavos Mar 31 '17 at 14:52
  • Load the file using – ZeroCool Mar 31 '17 at 14:53
  • In fact, are you *sure* there are any encodings at all? Where did you see these escape sequences? Did you mistake some debugger display for an escape sequence? That's exactly the case where replacements won't work - there are no escape sequences to begin with. – Panagiotis Kanavos Mar 31 '17 at 14:54
  • Yea I was looking at it in the debugger display. The symbols on the front end are display as “ and some are just – ZeroCool Mar 31 '17 at 14:56
  • here is the response I get from the server : use derivative instruments as described in Investment Strategies Use of Derivative Instruments ( in here these symbols dont even show but some the invalid symbols are display like a square like this [] ) – ZeroCool Mar 31 '17 at 14:58

0 Answers0