0

I need to change this string by escaping the windows path delimiters. I don't define the original string myself, so I can't pre-pend the raw string 'r'.

I need this:

s = 'C:\foo\bar'

to be this:

s = 'C:\\foo\\bar'

Everything I can find here and elsewhere says to do this:

s.replace( r'\\', r'\\\\' )

(Why I should have to escape the character inside a raw string I can't imagine)

But printing the string results in this. Obviously something has decided to re-interpret the escapes in the modified string:

C:♀oar

This would be so simple in Perl. How do I solve this in Python?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
user2647567
  • 37
  • 1
  • 7
  • 2
    If the string you're working with wasn't escaped properly then it contains formfeed (ASCII 12) and backspace (ASCII 8) characters. It's too late to reverse that! That is to say, you could try to replace them with their corresponding escape sequences, but it'd be a kludge. – John Kugelman Aug 03 '13 at 00:09
  • possible duplicate of [Windows path in python](http://stackoverflow.com/questions/2953834/windows-path-in-python) – Ryan Haining Aug 03 '13 at 00:16
  • it's pretty simple in python too. – Ryan Haining Aug 03 '13 at 00:16
  • I don't control the string, so 'r' isn't possible. And I don't know where in the string the file path will show up in or what else is in the string, so os.path.join() won't work either. All I need to do is the equivalent of Perl's $s =~ s/\\/\\\\/g. – user2647567 Aug 03 '13 at 00:31
  • 2
    What do you mean "I don't control the string"? Is someone actually handing you a string that has `\f` and '\b' characters in the middle of it and expecting you to use it as if it had backslashes and `f`, and `b` characters instead? Meanwhile, if it "would be so simple in Perl", show us the Perl code you'd write. – abarnert Aug 03 '13 at 00:34
  • Somebody is handing me the string, yes. I don't know what's in it, but there will likely be one or more Windows file paths. – user2647567 Aug 03 '13 at 00:35
  • Also, where have you found _anything_ that recommends `s.replace( r'\\', r'\\\\' )`? If `s` is already escaped, you don't want to doubly-escape things. If `s` is already full of control characters because it wasn't escaped, this won't help. So, it's either very bad advice, or advice for a different problem from yours. – abarnert Aug 03 '13 at 00:35
  • @user2647567: That perl code will not fix the problem. It does exactly the same thing as the `s.replace`—that is, nothing useful. – abarnert Aug 03 '13 at 00:37
  • Maybe you need to explain what you mean by "somebody is handing me a string". Is some other code calling your function, and passing a string in a variable? If so, does that string have a backspace, or a backslash and a b? Without knowing what you're actually doing, we can't help you. – abarnert Aug 03 '13 at 00:38
  • The string is in a file whose contents I don't control. It may contain backslashes in Windows file paths. I need to replace each '\' with '\\'. – user2647567 Aug 03 '13 at 00:42
  • @user2647567: You don't need to replace those at all. I'll write an answer and explain. – abarnert Aug 03 '13 at 01:08

2 Answers2

5

After a bunch of questions back and forth, the actual problem is this:

You have a file with contents like this:

C:\foo\bar
C:\spam\eggs

You want to read the contents of that file, and use it as pathnames, and you want to know how to escape things.

The answer is that you don't have to do anything at all.

Backslash sequences are processed in string literals, not in string objects that you read from a file, or from input (in 3.x; in 2.x that's raw_input), etc. So, you don't need to escape those backslash sequences.

If you think about it, you don't need to add quotes around a string to turn it into a string. And this is exactly the same case. The quotes and the escaping backslashes are both part of the string's representation, not the string itself.


In other words, if you save that example file as paths.txt, and you run the following code:

with open('paths.txt') as f:
    file_paths = [line.strip() for line in f]
literal_paths = [r'C:\foo\bar', r'C:\spam\eggs']
print(file_paths == literal_paths)

… it will print out True.


Of course if your file was generated incorrectly and is full of garbage like this:

C:♀oar

Then there is no way to "escape the backslashes", because they're not there to escape. You can try to write heuristic code to reconstruct the original data that was supposed to be there, but that's the best you can do.

For example, you could do something like this:

backslash_map = { '\a': r'\a', '\b': r'\b', '\f': r'\f', 
                  '\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v' }
def reconstruct_broken_string(s):
    for key, value in backslash_map.items():
        s = s.replace(key, value)
    return s

But this won't help if there were any hex, octal, or Unicode escape sequences to undo. For example, 'C:\foo\x02' and 'C:\foo\b' both represent the exact same string, so if you get that string, there's no way to know which one you're supposed to convert to. That's why the best you can do is a heuristic.

abarnert
  • 354,177
  • 51
  • 601
  • 671
0

Don't do s.replace(anything). Just stick an r in front of the string literal, before the opening quote, so you have a raw string. Anything based on string replacement would be a horrible kludge, since s doesn't actually have backslashes in it; your code has backslashes in it, but those don't become backslashes in the actual string.

If the string actually has backslashes in it, and you want the string to have two backslashes wherever there once was one, you want this:

s = s.replace('\\', r'\\')

That'll replace any single backslash with two backslashes. If the string literally appears in the source code as s = 'C:\foo\bar', though, the only reasonable solution is to change that line. It's broken, and anything you do to the rest of the code won't make it not broken.

user2357112
  • 260,549
  • 28
  • 431
  • 505