0

I have a plain string 'бекслеш \018 на точку' in Python 3. I got this string from an external HTML page, therefore it doesn't have the "r" prefix of a raw string. I don't know how to convert it to a raw string.

How can I replace the '\' with a dot '.'?

I've tried the following:

s = get_string()  # 'бекслеш \018 на точку'
print(s.replace('\\', '.'))
out: бекслеш 8 на точку

But I need 'бекслеш .018 на точку'.

UPD: It is clear that the programming language interprets the backslash as a control character. Question: how to make a replacement, if it is not possible to specify a string as raw, or is it not clear how to convert it to raw?

bl79
  • 1,291
  • 1
  • 15
  • 23
  • 3
    I think that `\018` is being interpreted as a hex/unicode character. – Tim Biegeleisen Apr 03 '18 at 03:00
  • 2
    There is no \ in the string. – wim Apr 03 '18 at 03:01
  • 1
    String objects don't have prefixes. String _literals_ in your source code do, but once the literal is interpreted by Python, it doesn't matter whether it was `r'a\b'` or `'a\\b'`—they both turn into the same string, `a\b`. – abarnert Apr 03 '18 at 03:09

2 Answers2

3

The difference between a string literal and a raw string is the way they are interpreted to create a string object from your source code. The objects they create are not distinct in any way. So there is no such thing as converting a string to a raw string.

In this case, '\018' stands for '\x01', which is the Start-of-Header character, followed by the character '8'.

chr(1) + '8' == '\x018' # True

And as you can see, your string contains no '\\' character.

'\\' in 'бекслеш \018 на точку' # False
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
  • \018 does not stand for a character. it is an \x01 and a plain old '8'. \x01 is "Start of heading" control character. – wim Apr 03 '18 at 03:16
  • 1
    Thanks I was looking for what character that was. – Olivier Melançon Apr 03 '18 at 03:22
  • It is clear that '\018' interpreted as '\x018'. How to disable this for replacement work? – bl79 Apr 03 '18 at 05:04
  • @bl79 There's nothing to replace, though. That's what this answer is saying – OneCricketeer Apr 03 '18 at 05:14
  • @bl79. I think that you're missing the point here. There is a difference between the literal representation `\x01` and the in-memory representation. You don't seem to be clear on which is which. – Mad Physicist Apr 03 '18 at 05:14
  • @bl79 The point of the answer is that what you thought was there isn't. So you may want to rethink how you want to parse that string. Start of Header is not a printable character so I doubt you want to replace it by anything. Maybe you want to remove it..? And if you want to replace it, see the [answer by pylang](https://stackoverflow.com/a/49622782/5079316). – Olivier Melançon Apr 03 '18 at 12:55
2

I think you actually want to replace the control character:

Code

print(s.replace("\x01", ".01"))
# бекслеш .018 на точку

Details

It is clear that the programming language interprets the backslash as a control character.

Actually the control character includes the escape character (\) and the adjacent code (01). Let's see how Python looks at each character:

print(list(s))
# ['б', 'е', 'к', 'с', 'л', 'е', 'ш', ' ', '\x01', '8', ' ', 'н', 'а', ' ', 'т', 'о', 'ч', 'к', 'у']

Notice \x01 is one character, not the backslash alone. You have to replace this entire character.


Addendum

Therefore, a general approach can be to iterate each character and substitute any that belong to the control character category with a new string. This new string should be formatted to mirror the value of the character it replaces. Otherwise, return a normal character.

from unicodedata import category


"".join(".{:02d}".format(ord(char)) if category(char).startswith("C") else char for char in s)
# 'бекслеш .018 на точку'
pylang
  • 40,867
  • 14
  • 129
  • 121