3

I am writing some text (which includes \n and \t characters) taken from one source file onto a (text) file ; for example:

source file (test.cpp):

/*
 * test.cpp
 *
 *    2013.02.30
 *
 */

is taken from the source file and stored in a string variable like so

test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"

which when I write onto a file using

    with open(test.cpp, 'a') as out:
        print(test_str, file=out)

is being written with the newline and tab characters converted to new lines and tab spaces (exactly like test.cpp had them) whereas I want them to remain \n and \t exactly like the test_str variable holds them in the first place.

Is there a way to achieve that in Python when writing to a file these 'special characters' without them being translated?

Yannis
  • 1,682
  • 7
  • 27
  • 45
  • Did you tried to add the backslash `"\"` to the special character `"\n"`--> `"\\n"`? See this [post](http://stackoverflow.com/questions/4245709/how-do-you-write-special-characters-n-b-to-a-file-in-python) – terence hill May 01 '16 at 20:58
  • 1
    @terencehill I was aware that such a string manipulation could meet my needs but I was hoping for something more subtle and/or built-in; the `encode` method seems perfect for this provided by Jon [below](http://stackoverflow.com/a/36971942/3286832). – Yannis May 02 '16 at 08:58

4 Answers4

2

Use replace(). And since you need to use it multiple times, you might want to look at this.

test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"
with open("somefile", "w") as f:
    test_str = test_str.replace('\n','\\n')
    test_str = test_str.replace('\t','\\t')
    f.write(test_str)
Community
  • 1
  • 1
quapka
  • 2,799
  • 4
  • 21
  • 35
  • Very useful. I was hoping for something more subtle and/or built-in. Especially, the regex approach might be an overkill for my case but still useful. – Yannis May 02 '16 at 08:47
2

You can use str.encode:

with open('test.cpp', 'a') as out:
    print(test_str.encode('unicode_escape').decode('utf-8'), file=out)

This'll escape all the Python recognised special escape characters.

Given your example:

>>> test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"
>>> test_str.encode('unicode_escape')
b'/*\\n test.cpp\\n *\\n *\\n *\\n\\t2013.02.30\\n *\\n */\\n'
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • Seems exactly what I was hoping Python had built-in and fits my needs. Could you please explain the purpose of `decode('utf-8')`, especially when on your example just the `encode('unicode_escape')` gives the solution? – Yannis May 02 '16 at 08:49
  • And for clarity, if I wanted say to reverse the effect to have the newline and tab characters as they were originally (before the `str.encdode`), how would I achieve that? – Yannis May 02 '16 at 09:07
  • @Yannis the encoding gives you a byte string (notice the `b` prefix and the output in the file when printed) - decoding it gives you back a unicode string. – Jon Clements May 02 '16 at 13:47
1

I want them to remain \n and \t exactly like the test_str variable holds them in the first place.

test_str does NOT contain the backslash \ + t (two characters). It contains a single character ord('\t') == 9 (the same character as in the test.cpp). Backslash is special in Python string literals e.g., u'\U0001f600' is NOT ten characters—it is a single character Don't confuse a string object in memory during runtime and its text representation as a string literal in Python source code.

JSON could be a better alternative than unicode-escape encoding to store text (more portable) i.e., use:

import json

with open('test.json', 'w') as file:
    json.dump({'test.cpp': test_str}, file)

instead of test_str.encode('unicode_escape').decode('ascii').

To read json back:

with open('test.json') as file:
    test_str = json.load(file)['test.cpp']
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Explaining the difference between string object in memory during runtime and Python's string literal is greatly appreciated. JSON's portability is good, but for my case not an issue. – Yannis May 02 '16 at 11:09
0

Using str.encode with 'unicode_escape' as indicated in Jon Clements answer is not a good solution because it escapes all Unicode characters, which gives bad results when used with anything other than English:

>>> t = 'English text\tTexte en Français\nنص بالعربية\t中文文本\n'
>>> t
'English text\tTexte en Français\nنص بالعربية\t中文文本\n'
>>> t.encode('unicode_escape').decode('utf-8')
'English text\\tTexte en Fran\\xe7ais\\n\\u0646\\u0635 \\u0628\\u0627\\u0644\\u0639\\u0631\\u0628\\u064a\\u0629\\t\\u4e2d\\u6587\\u6587\\u672c\\n'

As you can see, the display of anything other than ASCII has been transformed into escape characters, which is not the expected behaviour. But you see that the Python console does not have this problem and displays non-ASCII characters perfectly.

To achieve something similar to what the Python console does, use the following code:

>>> repr(t).strip("'")
'English text\\tTexte en Français\\nنص بالعربية\\t中文文本\\n'

repr(t) does everything cleanly except that it adds single quote marks around the text, so we remove them using .strip("'").

Hamza Abbad
  • 564
  • 3
  • 15