0

NOTE: I didn't want to make the question too long, so "how" from the question title translates into "is there a method for this in .Net framework".

Just a reminder:

\n

if the above is printable representation of the string, then its internal version is one character, not two, which code is 0x0a.

However I could write:

string s = "\\n"; // three characters in editor

which translates to internal representation of 2 characters \ and n. This is not 0x0a character!

But I would like to achieve this. I.e. I have already in string some data, which I know is printable representation of a string. And I would like to convert it to internal representation (something C# compiler does all the time).

string printable = "\\n";
string internal_ = convert(printable);

internal_ would hold only one character now, of code 0x0a.

Question: is there ready to use (!) function for such conversion?

greenoldman
  • 16,895
  • 26
  • 119
  • 185
  • 1
    You could get the bytes using one of the `Encoding.GetBytes` methods. – Patrick Hofman Oct 27 '14 at 14:30
  • @PatrickHofman, it gives you the bytes of the string, but it does not interpret string. This `convert` could be called multiple times on string, and it would shorten each metacharacter on each call. So in short, it does something different. – greenoldman Oct 27 '14 at 14:37
  • Maybe you need to make your question clearer, since I think you ask for a way to get `0x0a`, not `\\n`. – Patrick Hofman Oct 27 '14 at 14:38
  • 1
    So to be clear: you're looking for a method that translates the string `@"\n"` to the character `0x0a`? – CodeCaster Oct 27 '14 at 14:38
  • When making localization system for my project I escape/un-escape new line symbol with simple `Replace("\\n", "\xd\xa")` (for escaping opposite). You could make something more convenient, by using same principle. – Sinatr Oct 27 '14 at 14:39
  • @CodeCaster, as an example yes. – greenoldman Oct 27 '14 at 14:40
  • @Sinatr, yes, I know the drill, that's why I asked for something ready to use :-). – greenoldman Oct 27 '14 at 14:40

2 Answers2

3

You could try to use the System.Text.RegularExpressions.Regex.Unescape static method:

string internal_ = Regex.Unescape(printable);

But unfortunately it applicable mainly for Regex control characters.

Test:

var chars = internal_.ToCharArray();

The chars array has 1 element with code 0x0a.

Dmitry
  • 13,797
  • 6
  • 32
  • 48
2

This is actually quite complex, there are many more cases than you describe that need to be accounted for in order to parse string literals. Consider for example \0x0a. Fortunately, you are not the first person requesting this. Regex.Unescape handles most, but not all cases. DeepDiver has a blog post with code that parses C# literal strings - this should do what you request.

Bas
  • 26,772
  • 8
  • 53
  • 86
  • I know it is complex task, that is why I am looking for something to use, not to reinvent. I am still hoping for using Roslyn for this, however this would be heavy weight solution :-) – greenoldman Oct 27 '14 at 14:45