41

I read in a string from a GUI textbox entered by the user and process it through pandoc. The string contains latex directives for math which have backslash characters. I want to send in the string as a raw string to pandoc for processing. But something like "\theta" becomes a tab and "heta".

How can I convert a string literal that contains backslash characters to a raw string...?

Edit:

Thanks develerx, flying sheep and unutbu. But none of the solutions seem to help me. The reason is that there are other backslashed-characters which do not have any effect in python but do have a meaning in latex.

For example '\lambda'. All the methods suggested produce

\\lambda

which does not go through in latex processing -- it should remain as \lambda.

Another edit:

If i can get this work, i think i should be through. @Mark: All three methods give answers that i dont desire.

a='\nu + \lambda + \theta'; 
b=a.replace(r"\\",r"\\\\"); 
c='%r' %a; 
d=a.encode('string_escape');
print a

u + \lambda +   heta
print b

u + \lambda +   heta
print c
'\nu + \\lambda + \theta'
print d
\nu + \\lambda + \theta
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
Vijay Murthy
  • 841
  • 2
  • 9
  • 19
  • Are you sure the string really contains `\\lambda` and is not just doubling up when you print it? Try printing `mystring[1:]` and see if there is still a `\ ` in it. There should be some consistency - if `\t` is converting to tab then `\\ ` should convert to `\ `. – Mark Ransom Aug 31 '11 at 20:52
  • Can you post the `repr` of the string as received from the GUI textbox, and show the code you are using to process it through pandoc? – unutbu Aug 31 '11 at 20:59
  • Your test is unrealistic. You aren't getting it from a textbox, you're setting it with a string literal, and Python has already converted it in an inconsistent manner by the time it's assigned to `a`. It is impossible to get your original text back at that point. – Mark Ransom Aug 31 '11 at 21:21
  • My apologies. I was doing a silly error in reading the text from the GUI. The problem is now solved. Thanks for your comments and sorry for troubling you. – Vijay Murthy Aug 31 '11 at 21:36
  • @Vijay: So i was right with “your user input is for some arcane reason interpretting the backslashes, so you’ll need a way to tell it to stop that”? – flying sheep Sep 01 '11 at 16:28
  • 2
    Note that this question isn't exactly about raw strings; it's about escaping latex code. The OP mistakenly believed them to be the same thing. For a question that's *actually* about converting special characters into escape sequences, see [here](https://stackoverflow.com/q/2428117/1222951). – Aran-Fey Oct 10 '18 at 19:07

5 Answers5

42

Python’s raw strings are just a way to tell the Python interpreter that it should interpret backslashes as literal slashes. If you read strings entered by the user, they are already past the point where they could have been raw. Also, user input is most likely read in literally, i.e. “raw”.

This means the interpreting happens somewhere else. But if you know that it happens, why not escape the backslashes for whatever is interpreting it?

s = s.replace("\\", "\\\\")

(Note that you can't do r"\" as “a raw string cannot end in a single backslash”, but I could have used r"\\" as well for the second argument.)

If that doesn’t work, your user input is for some arcane reason interpreting the backslashes, so you’ll need a way to tell it to stop that.

Qantas 94 Heavy
  • 15,750
  • 31
  • 68
  • 83
flying sheep
  • 8,475
  • 5
  • 56
  • 73
  • 2
    This is the first time I've seen the "raw string cannot end in a single backslash". I hadn't realized Python string parsing was so hacky - I thought the `r` prefix meant to stop treating backslashes as special, instead it means output both characters instead of interpreting them. – Mark Ransom Nov 17 '17 at 17:55
  • @MarkRansom yeah, f-strings are also just string postprocessing and not an actual subparser… – flying sheep Nov 24 '17 at 12:33
  • 1
    @MarkRansom But otherwise how would you add a `"` without closing the string literal? That's why they can't end in backslash, because it interprets it as the quote character, so the string hasn't finished yet. – Anakhand Aug 15 '18 at 09:12
  • @Anakhand I had just assumed you couldn't put a quote in a raw string. It's still kind of tough since you can't have one without a backslash in front of it. – Mark Ransom Aug 15 '18 at 13:28
17

If you want to convert an existing string to raw string, then we can reassign that like below

s1 = "welcome\tto\tPython"
raw_s1 = "%r"%s1
print(raw_s1)

Will print

welcome\tto\tPython
Barmar
  • 741,623
  • 53
  • 500
  • 612
prasad
  • 217
  • 5
  • 10
  • 3
    I believe, at least in Python3, this will actually print out: 'welcome\\tto\\tPython' -- Including the single quotes. – disflux Sep 26 '17 at 14:30
  • 1
    @disflux I just tested it with Python 3.3.6 and in printed: ``>>> s1 = "welcome\tto\tPython" >>> raw_s1 = "%r"%s1 >>> print(raw_s1) 'welcome\tto\tPython'`` – Evandro Coan Nov 25 '17 at 11:21
  • 1
    Newbie to python. Can anybody please explain what's the simple trick used here? – CodingOwl Apr 19 '18 at 19:19
  • 1
    >>> s1 = "welcome\tto\tPython" >>> raw_s1 = "%r"%s1 >>> s2 = r"welcome\tto\tPython" >>> raw_s1 == s2 False – weefwefwqg3 Oct 10 '18 at 16:07
  • 6
    This is just `repr` in disguise. And it doesn't really answer the question correctly. – wim Oct 10 '18 at 16:50
5
a='\nu + \lambda + \theta'
d=a.encode('string_escape').replace('\\\\','\\')
print(d)
# \nu + \lambda + \theta

This shows that there is a single backslash before the n, l and t:

print(list(d))
# ['\\', 'n', 'u', ' ', '+', ' ', '\\', 'l', 'a', 'm', 'b', 'd', 'a', ' ', '+', ' ', '\\', 't', 'h', 'e', 't', 'a']

There is something funky going on with your GUI. Here is a simple example of grabbing some user input through a Tkinter.Entry. Notice that the text retrieved only has a single backslash before the n, l, and t. Thus no extra processing should be necessary:

import Tkinter as tk

def callback():
    print(list(text.get()))

root = tk.Tk()
root.config()

b = tk.Button(root, text="get", width=10, command=callback)

text=tk.StringVar()

entry = tk.Entry(root,textvariable=text)
b.pack(padx=5, pady=5)
entry.pack(padx=5, pady=5)
root.mainloop()

If you type \nu + \lambda + \theta into the Entry box, the console will (correctly) print:

['\\', 'n', 'u', ' ', '+', ' ', '\\', 'l', 'a', 'm', 'b', 'd', 'a', ' ', '+', ' ', '\\', 't', 'h', 'e', 't', 'a']

If your GUI is not returning similar results (as your post seems to suggest), then I'd recommend looking into fixing the GUI problem, rather than mucking around with string_escape and string replace.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • that’s nice if it’s python that interprets the string. if it’s pandoc, it might not work. do you know what (apart from backslashes) is else done by `string_escape`? maybe it does too much? – flying sheep Aug 31 '11 at 20:41
  • @flying sheep: The docs say `string_escape` ["produces a string that is suitable as string literal in Python source code."](http://docs.python.org/library/codecs.html). AFAIK, `string_escape` affects backslashes or backslashed characters and nothing else. Perhaps I'm wrong. Would be happy to learn if it does more. – unutbu Aug 31 '11 at 21:42
  • i don’t know more than you. most likely you are right. but again: if the point where the interpretation happens eats some escapes (such as `\s`→` `), then this will yield silent errors. he should find the source. – flying sheep Sep 01 '11 at 16:26
  • This does not work, it's not equivalent to a raw string: compare the result of `print(repr(r"\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\x93"))` with `print("\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\x93".encode('string_escape').replace('\\\\','\\'))`. – gaborous Nov 27 '16 at 02:05
3

When you read the string from the GUI control, it is already a "raw" string. If you print out the string you might see the backslashes doubled up, but that's an artifact of how Python displays strings; internally there's still only a single backslash.

>>> a='\nu + \lambda + \theta'
>>> a
'\nu + \\lambda + \theta'
>>> len(a)
20
>>> b=r'\nu + \lambda + \theta'
>>> b
'\\nu + \\lambda + \\theta'
>>> len(b)
22
>>> b[0]
'\\'
>>> print b
\nu + \lambda + \theta
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
2

I spent a lot of time trying different answers all around the internet, and I suspect the reasons why one thing works for some people and not for others is due to very small weird differences in application. For context, I needed to read in file names from a csv file that had strange and/or unmappable unicode characters and write them to a new csv file. For what it's worth, here's what worked for me:

s = '\u00e7\u00a3\u0085\u00e5\u008d\u0095' # csv freaks if you try to write this
s = repr(s.encode('utf-8', 'ignore'))[2:-1]
Katherine
  • 21
  • 3