1

I want to use repr() to get a Python-encoded string literal (that I can paste into some source code), but I'd prefer a triple-quoted string with real newlines rather than the \n escape sequence.

I could post-process the string to convert \n back into a newline char and add a couple more quotes, but then if \\n is in the source, then I wouldn't want to match on that.

What's the easiest way to do this?


Example input:

foo
bar

Or as a Python string:

'foo\nbar'

Desired output:

'''foo\xf0\x9f\x92\xa9
bar'''

Triple-single or triple-double quotes is fine, but I do want it broken on multiple lines like that.


What I have so far:

#!/usr/bin/env python
import sys
import re

with open(sys.argv[1], 'r+') as f:
    data = f.read()
    f.seek(0)
    out = "''" + re.sub(r"\\n", '\n', repr(data)) + "''"
    f.write(out)
    f.truncate()

I'm still trying to figure out the regex to avoid converting escaped \ns.

The goal is that if I paste that back into a Python source file I will get back out exactly the same thing as I read in.


I'm using Python 2.7.14

mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • 1
    Isn't that just `print(your_string)`? I don't really get your desired input and output. – wim Mar 28 '19 at 22:56
  • @wim No. `repr` will escape quotes, emojis and other control characters, which I do want. – mpen Mar 28 '19 at 22:59
  • 1
    OK, please post an example input and output. Btw repr will not escape emojis in the current version of Python - maybe you should tag this with python-2.x ? – wim Mar 28 '19 at 22:59
  • @wim Added to question. – mpen Mar 28 '19 at 23:03
  • 2
    Are you really sure you want `'foo\nbar'` and not `u'foo\nbar'`? The proper escape here would be `foo\U0001f4a9\nbar` - what you are showing here is utf-8 encoded – wim Mar 28 '19 at 23:04
  • Uhh.. yeah, I think you're right. I don't actually have any emoji poops in my source, but there might be some other wonky stuff. I basically just need Python to be able to parse it and come out the same way as the input. – mpen Mar 28 '19 at 23:08

2 Answers2

2

How about splitlines it and encoding each line separately:

s = 'foo\nbar'

r = "'''" + '\n'.join(repr(x)[1:-1] for x in s.splitlines()) + "'''"

assert eval(r) == s

If you're on python2 and the inputs are unicode, then repr[2:-1] to strip the leading u as well. The same applies to py3 and bytes inputs.

georg
  • 211,518
  • 52
  • 313
  • 390
0

Final solution to convert a text file into a string which you can paste into your source code:

#!/usr/bin/env python
import sys
import re
import io

with io.open(sys.argv[1], 'r+', encoding='utf8') as f:
    data = f.read()
    f.seek(0)
    out = u"u'''" + u'\n'.join(repr(x)[2:-1] for x in data.splitlines()) + u"'''"
    f.write(out)
    f.truncate()

Warning: it overwrites the source file. I'm using temporary files for this, so that's what I wanted.

Credit:

mpen
  • 272,448
  • 266
  • 850
  • 1,236