225

In C#, can I convert a string value to a string literal, the way I would see it in code? I would like to replace tabs, newlines, etc. with their escape sequences.

If this code:

Console.WriteLine(someString);

produces:

Hello
World!

I want this code:

Console.WriteLine(ToLiteral(someString));

to produce:

\tHello\r\n\tWorld!\r\n
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Hallgrim
  • 15,143
  • 10
  • 46
  • 54

16 Answers16

206

A long time ago, I found this:

private static string ToLiteral(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = CodeDomProvider.CreateProvider("CSharp"))
        {
            provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);
            return writer.ToString();
        }
    }
}

This code:

var input = "\tHello\r\n\tWorld!";
Console.WriteLine(input);
Console.WriteLine(ToLiteral(input));

Produces:

    Hello
    World!
"\tHello\r\n\tWorld!"

These days, Graham discovered you can use Roslyn's Microsoft.CodeAnalysis.CSharp package on NuGet:

private static string ToLiteral(string valueTextForCompiler)
{
    return Microsoft.CodeAnalysis.CSharp.SymbolDisplay.FormatLiteral(valueTextForCompiler, false);
}
Hallgrim
  • 15,143
  • 10
  • 46
  • 54
  • 1
    Just found this from google the subject. This has to be best, no point in reinventing stuff that .net can do for us – Andy Morris Jan 19 '10 at 13:58
  • 20
    Nice one, but be aware that for longer strings, this will insert "+" operators, newlines and indentation. I couldn't find a way to turn that off. – Timwi May 04 '10 at 21:49
  • Oh, also — the ".GetStringBuilder()" is redundant. You can just use .ToString() directly on a StringWriter. – Timwi May 04 '10 at 21:50
  • This doesn't even take care \f. – Ronnie Overby Aug 10 '12 at 16:15
  • 4
    What about the inverse ? If you have a file with text containg escape sequences incluidng especial character escaped with its ascii code ? How to produce a raw version ? – Luciano Nov 29 '12 at 16:57
  • 1
    If you run: void Main() { Console.WriteLine(ToLiteral("test \"\'\\\0\a\b\f\n\r\t\v\uaaaa \\\blah")); } you'll notice that this doesn't take care of a few escapes. Ronnie Overby pointed \f, the others are \a and \b – boggy Feb 01 '13 at 21:34
  • 1
    My weekend project: An implementation of this routine on a mvc form. If you only need to do this occasionally, you can hit my page at http://csharpstringescape.apphb.com/ (code is on github) – JoshRivers Sep 18 '13 at 18:22
  • 6
    Is there a way to make it output verbatim (`@"..."`) literals? – rookie1024 Mar 27 '16 at 20:35
  • @rookie1024 I'm interested as well. Using this gives me mixed results, some are `@\"..\"` some are just `\"...\"` – interesting-name-here Jun 29 '17 at 16:50
  • I put an answer at the bottom to attempt to add a verbatim version of Hallgrim's answer. I'm guessing it isn't perfect, I'll use it for awhile and see how it works. – Derek Nov 02 '17 at 11:16
  • 1
    Don't forget `using System.CodeDom;` and `using System.CodeDom.Compiler;` at the top of the code! – Shrout1 Jan 09 '19 at 20:55
  • Didn't for everything for me including `\a` and `\u0004`. [Graham's answer](https://stackoverflow.com/a/58825732) is the only one which worked properly for me – Dan May 13 '22 at 18:55
  • Don't know what this is, but it does not work. – IvanP Oct 21 '22 at 14:43
49

Use Regex.Escape(String):

Regex.Escape escapes a minimal set of characters (, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Shqdooow
  • 747
  • 5
  • 3
  • 8
    +1 no idea why this is way below. Other answers are just too verbose and look like reinventing wheels – Adriano Carneiro Jul 10 '14 at 22:38
  • 56
    This is not what OP is asking for. It doesn't return a string literal, it returns a string with Regex special characters escaped. This would turn `Hello World?` into `Hello World\?`, but that is an invalid string literal. – atheaos May 22 '15 at 20:00
  • 3
    I agree with @atheaos, this is a great answer to a very different question. – hypehuman Jul 31 '15 at 20:58
  • 7
    +1 even though it doesn't quite answer the OP's question it was what I (and so I suspect maybe others) were looking for when I came across this question. :) – GazB Jun 08 '16 at 15:29
  • 1
    This will not work as needed. The regex special characters are not the same. It will work for \n for example, but when you have a space, it will be converted to "\ " which is not what C# would do... – Ernesto Sep 23 '19 at 18:37
33

There's a method for this in Roslyn's Microsoft.CodeAnalysis.CSharp package on NuGet:

private static string ToLiteral(string valueTextForCompiler)
{
    return Microsoft.CodeAnalysis.CSharp.SymbolDisplay.FormatLiteral(valueTextForCompiler, false);
}

Obviously, this didn't exist at the time of the original question, but it might help people who end up here from Google Search.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Graham
  • 1,529
  • 16
  • 22
30

This is a fully working implementation, including escaping of Unicode and ASCII non-printable characters. It does not insert "+" signs like Hallgrim's answer.

static string ToLiteral(string input) {
    StringBuilder literal = new StringBuilder(input.Length + 2);
    literal.Append("\"");
    foreach (var c in input) {
        switch (c) {
            case '\"': literal.Append("\\\""); break;
            case '\\': literal.Append(@"\\"); break;
            case '\0': literal.Append(@"\0"); break;
            case '\a': literal.Append(@"\a"); break;
            case '\b': literal.Append(@"\b"); break;
            case '\f': literal.Append(@"\f"); break;
            case '\n': literal.Append(@"\n"); break;
            case '\r': literal.Append(@"\r"); break;
            case '\t': literal.Append(@"\t"); break;
            case '\v': literal.Append(@"\v"); break;
            default:
                // ASCII printable character
                if (c >= 0x20 && c <= 0x7e) {
                    literal.Append(c);
                // As UTF16 escaped character
                } else {
                    literal.Append(@"\u");
                    literal.Append(((int)c).ToString("x4"));
                }
                break;
        }
    }
    literal.Append("\"");
    return literal.ToString();
}

Note that this also escapes all Unicode characters. If your environment supports them, you could change that part to escape only control characters:

// UTF16 control characters
} else if (Char.GetUnicodeCategory(c) == UnicodeCategory.Control) {
    literal.Append(@"\u");
    literal.Append(((int)c).ToString("x4"));
} else {
    literal.Append(c);
}
Smilediver
  • 1,738
  • 23
  • 26
  • 2
    You should use `Char.GetUnicodeCategory(c) == UnicodeCategory.Control` to decide whether to escape it, or people who don't speak ASCII won't be very happy. – deerchao Jan 24 '13 at 13:15
  • This depends on situation if your resulting string will be used in the environment supporting unicode or not. – Smilediver Jan 29 '13 at 13:59
  • 1
    I added `input = input ?? string.Empty;` as the first line of the method so I could pass `null` and get back `""` instead of a null reference exception. – Andy Jan 08 '17 at 19:23
  • Nice. Change enclosing quotes to `'` and now you have what Python gives you out of the box with `repr(a_string)` :). – z33k Nov 07 '19 at 12:14
  • Why did you escape `'` as that is not necessary? – trinalbadger587 Aug 17 '21 at 03:05
  • @trinalbadger587 I don't remember, it was 10 years ago. :) But you're right, escaping `'` is most likely unnecessary. – Smilediver Aug 20 '21 at 11:42
  • @Smilediver, you should edit your answer then. – trinalbadger587 Aug 20 '21 at 19:13
26

A more structured approach, including all escape sequences for strings and chars, is:

It doesn't replace Unicode characters with their literal equivalent. It doesn't cook eggs, either.

public class ReplaceString
{
    static readonly IDictionary<string, string> m_replaceDict
        = new Dictionary<string, string>();

    const string ms_regexEscapes = @"[\a\b\f\n\r\t\v\\""]";

    public static string StringLiteral(string i_string)
    {
        return Regex.Replace(i_string, ms_regexEscapes, match);
    }

    public static string CharLiteral(char c)
    {
        return c == '\'' ? @"'\''" : string.Format("'{0}'", c);
    }

    private static string match(Match m)
    {
        string match = m.ToString();
        if (m_replaceDict.ContainsKey(match))
        {
            return m_replaceDict[match];
        }

        throw new NotSupportedException();
    }

    static ReplaceString()
    {
        m_replaceDict.Add("\a", @"\a");
        m_replaceDict.Add("\b", @"\b");
        m_replaceDict.Add("\f", @"\f");
        m_replaceDict.Add("\n", @"\n");
        m_replaceDict.Add("\r", @"\r");
        m_replaceDict.Add("\t", @"\t");
        m_replaceDict.Add("\v", @"\v");

        m_replaceDict.Add("\\", @"\\");
        m_replaceDict.Add("\0", @"\0");

        //The SO parser gets fooled by the verbatim version
        //of the string to replace - @"\"""
        //so use the 'regular' version
        m_replaceDict.Add("\"", "\\\"");
    }

    static void Main(string[] args){

        string s = "here's a \"\n\tstring\" to test";
        Console.WriteLine(ReplaceString.StringLiteral(s));
        Console.WriteLine(ReplaceString.CharLiteral('c'));
        Console.WriteLine(ReplaceString.CharLiteral('\''));

    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Cristian Diaconescu
  • 34,633
  • 32
  • 143
  • 233
21

Try:

var t = HttpUtility.JavaScriptStringEncode(s);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Arsen Zahray
  • 24,367
  • 48
  • 131
  • 224
  • Does not work. If I have "abc\n123" (without quotes, 8 chars), I want "abc" + \n + "123" (7 chars). Instead it produces "abc" + "\\" + "\n123" (9 chars). Notice the slash was doubled and it still contains a string literal of "\n" as two characters, not the escaped character. – Paul Mar 07 '12 at 20:13
  • 2
    @Paul What you want is the opposite of what the question is asking, though. This, according to your description, answers the question, and therefore _does_ work. – Nic Jan 04 '17 at 20:19
  • I found this useful to escape active directory names in the frontend – chakeda Oct 17 '17 at 17:37
19

Hallgrim's answer is excellent, but the "+", newline and indent additions were breaking functionality for me. An easy way around it is:

private static string ToLiteral(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = CodeDomProvider.CreateProvider("CSharp"))
        {
            provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, new CodeGeneratorOptions {IndentString = "\t"});
            var literal = writer.ToString();
            literal = literal.Replace(string.Format("\" +{0}\t\"", Environment.NewLine), "");
            return literal;
        }
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
lesur
  • 261
  • 2
  • 7
  • Works great. I also added one line before the `return literal` to make it more readable: `literal = literal.Replace("\\r\\n", "\\r\\n\"+\r\n\"");` – Bob May 08 '13 at 17:54
  • Added this `literal = literal.Replace("/", @"\/");` for `JSON` functionality. – interesting-name-here Jun 29 '17 at 15:32
  • This is 100% straight forward and the only correct answer! All other answers either didn't understand the question or re-invented the wheel. – bytecode77 Dec 27 '17 at 13:47
  • Sad, cannot get this to work under DOTNET CORE. Anyone has a better answer? – s k Feb 06 '18 at 08:33
19
public static class StringHelpers
{
    private static Dictionary<string, string> escapeMapping = new Dictionary<string, string>()
    {
        {"\"", @"\\\"""},
        {"\\\\", @"\\"},
        {"\a", @"\a"},
        {"\b", @"\b"},
        {"\f", @"\f"},
        {"\n", @"\n"},
        {"\r", @"\r"},
        {"\t", @"\t"},
        {"\v", @"\v"},
        {"\0", @"\0"},
    };

    private static Regex escapeRegex = new Regex(string.Join("|", escapeMapping.Keys.ToArray()));

    public static string Escape(this string s)
    {
        return escapeRegex.Replace(s, EscapeMatchEval);
    }

    private static string EscapeMatchEval(Match m)
    {
        if (escapeMapping.ContainsKey(m.Value))
        {
            return escapeMapping[m.Value];
        }
        return escapeMapping[Regex.Escape(m.Value)];
    }
}
William Jockusch
  • 26,513
  • 49
  • 182
  • 323
ICR
  • 13,896
  • 4
  • 50
  • 78
10

Here is a little improvement for Smilediver's answer. It will not escape all no-ASCII characters, but only these are really needed.

using System;
using System.Globalization;
using System.Text;

public static class CodeHelper
{
    public static string ToLiteral(this string input)
    {
        var literal = new StringBuilder(input.Length + 2);
        literal.Append("\"");
        foreach (var c in input)
        {
            switch (c)
            {
                case '\'': literal.Append(@"\'"); break;
                case '\"': literal.Append("\\\""); break;
                case '\\': literal.Append(@"\\"); break;
                case '\0': literal.Append(@"\0"); break;
                case '\a': literal.Append(@"\a"); break;
                case '\b': literal.Append(@"\b"); break;
                case '\f': literal.Append(@"\f"); break;
                case '\n': literal.Append(@"\n"); break;
                case '\r': literal.Append(@"\r"); break;
                case '\t': literal.Append(@"\t"); break;
                case '\v': literal.Append(@"\v"); break;
                default:
                    if (Char.GetUnicodeCategory(c) != UnicodeCategory.Control)
                    {
                        literal.Append(c);
                    }
                    else
                    {
                        literal.Append(@"\u");
                        literal.Append(((ushort)c).ToString("x4"));
                    }
                    break;
            }
        }
        literal.Append("\"");
        return literal.ToString();
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
deerchao
  • 10,454
  • 9
  • 55
  • 60
8

Interesting question.

If you can't find a better method, you can always replace.
In case you're opting for it, you could use this C# Escape Sequence List:

  • \' - single quote, needed for character literals
  • \" - double quote, needed for string literals
  • \ - backslash
  • \0 - Unicode character 0
  • \a - Alert (character 7)
  • \b - Backspace (character 8)
  • \f - Form feed (character 12)
  • \n - New line (character 10)
  • \r - Carriage return (character 13)
  • \t - Horizontal tab (character 9)
  • \v - Vertical quote (character 11)
  • \uxxxx - Unicode escape sequence for character with hex value xxxx
  • \xn[n][n][n] - Unicode escape sequence for character with hex value nnnn (variable length version of \uxxxx)
  • \Uxxxxxxxx - Unicode escape sequence for character with hex value xxxxxxxx (for generating surrogates)

This list can be found in the C# Frequently Asked Questions What character escape sequences are available?

Gert van den Berg
  • 2,448
  • 31
  • 41
Nelson Reis
  • 4,780
  • 9
  • 43
  • 61
4

If JSON conventions are enough for the unescaped strings you want to get escaped and you already use Json.NET (Newtonsoft.Json) in your project (it has a pretty large overhead), you may use this package like the following:

using System;
using Newtonsoft.Json;

public class Program
{
    public static void Main()
    {
        Console.WriteLine(ToLiteral(@"abc\n123"));
    }

    private static string ToLiteral(string input)
    {
        return JsonConvert.DeserializeObject<string>("\"" + input + "\"");
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ehsan88
  • 3,569
  • 5
  • 29
  • 52
2
public static class StringEscape
{
  static char[] toEscape = "\0\x1\x2\x3\x4\x5\x6\a\b\t\n\v\f\r\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\"\\".ToCharArray();
  static string[] literals = @"\0,\x0001,\x0002,\x0003,\x0004,\x0005,\x0006,\a,\b,\t,\n,\v,\f,\r,\x000e,\x000f,\x0010,\x0011,\x0012,\x0013,\x0014,\x0015,\x0016,\x0017,\x0018,\x0019,\x001a,\x001b,\x001c,\x001d,\x001e,\x001f".Split(new char[] { ',' });

  public static string Escape(this string input)
  {
    int i = input.IndexOfAny(toEscape);
    if (i < 0) return input;

    var sb = new System.Text.StringBuilder(input.Length + 5);
    int j = 0;
    do
    {
      sb.Append(input, j, i - j);
      var c = input[i];
      if (c < 0x20) sb.Append(literals[c]); else sb.Append(@"\").Append(c);
    } while ((i = input.IndexOfAny(toEscape, j = ++i)) > 0);

    return sb.Append(input, j, input.Length - j).ToString();
  }
}
Serge N
  • 96
  • 1
  • 7
  • An explanation would be in order. E.g., what is the idea/gist? E.g., is it due to performance considerations? Please respond by editing your answer, not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Jun 22 '21 at 16:50
2

My attempt at adding ToVerbatim to Hallgrim's accepted answer:

private static string ToLiteral(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = CodeDomProvider.CreateProvider("CSharp"))
        {
            provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, new CodeGeneratorOptions { IndentString = "\t" });
            var literal = writer.ToString();
            literal = literal.Replace(string.Format("\" +{0}\t\"", Environment.NewLine), "");
            return literal;
        }
    }
}

private static string ToVerbatim(string input)
{
    string literal = ToLiteral(input);
    string verbatim = "@" + literal.Replace(@"\r\n", Environment.NewLine);
    return verbatim;
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Derek
  • 7,615
  • 5
  • 33
  • 58
1

Hallgrim's answer was excellent. Here's a small tweak in case you need to parse out additional white space characters and linebreaks with a C# regular expression. I needed this in the case of a serialized JSON value for insertion into Google Sheets and ran into trouble as the code was inserting tabs, +, spaces, etc.

  provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);
  var literal = writer.ToString();
  var r2 = new Regex(@"\"" \+.\n[\s]+\""", RegexOptions.ECMAScript);
  literal = r2.Replace(literal, "");
  return literal;
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Xelnath
  • 65
  • 8
-1

I submit my own implementation, which handles null values and should be more performant on account of using array lookup tables, manual hex conversion, and avoiding switch statements.

using System;
using System.Text;
using System.Linq;

public static class StringLiteralEncoding {
  private static readonly char[] HEX_DIGIT_LOWER = "0123456789abcdef".ToCharArray();
  private static readonly char[] LITERALENCODE_ESCAPE_CHARS;

  static StringLiteralEncoding() {
    // Per http://msdn.microsoft.com/en-us/library/h21280bw.aspx
    var escapes = new string[] { "\aa", "\bb", "\ff", "\nn", "\rr", "\tt", "\vv", "\"\"", "\\\\", "??", "\00" };
    LITERALENCODE_ESCAPE_CHARS = new char[escapes.Max(e => e[0]) + 1];
    foreach(var escape in escapes)
      LITERALENCODE_ESCAPE_CHARS[escape[0]] = escape[1];
  }

  /// <summary>
  /// Convert the string to the equivalent C# string literal, enclosing the string in double quotes and inserting
  /// escape sequences as necessary.
  /// </summary>
  /// <param name="s">The string to be converted to a C# string literal.</param>
  /// <returns><paramref name="s"/> represented as a C# string literal.</returns>
  public static string Encode(string s) {
    if(null == s) return "null";

    var sb = new StringBuilder(s.Length + 2).Append('"');
    for(var rp = 0; rp < s.Length; rp++) {
      var c = s[rp];
      if(c < LITERALENCODE_ESCAPE_CHARS.Length && '\0' != LITERALENCODE_ESCAPE_CHARS[c])
        sb.Append('\\').Append(LITERALENCODE_ESCAPE_CHARS[c]);
      else if('~' >= c && c >= ' ')
        sb.Append(c);
      else
        sb.Append(@"\x")
          .Append(HEX_DIGIT_LOWER[c >> 12 & 0x0F])
          .Append(HEX_DIGIT_LOWER[c >>  8 & 0x0F])
          .Append(HEX_DIGIT_LOWER[c >>  4 & 0x0F])
          .Append(HEX_DIGIT_LOWER[c       & 0x0F]);
    }

    return sb.Append('"').ToString();
  }
}
J Cracknell
  • 3,498
  • 1
  • 19
  • 13
-10

Code:

string someString1 = "\tHello\r\n\tWorld!\r\n";
string someString2 = @"\tHello\r\n\tWorld!\r\n";

Console.WriteLine(someString1);
Console.WriteLine(someString2);

Output:

    Hello
    World!

\tHello\r\n\tWorld!\r\n
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
rfgamaral
  • 16,546
  • 57
  • 163
  • 275
  • 1
    I have someString1, but it is read from a file. I want it to appear as someString2 after calling some method. – Hallgrim Nov 27 '08 at 21:51
  • The string may be dynamically created/obtained he needs a method to handle any string – rufw91 Nov 30 '21 at 08:42