1

How can be replaced in string these characters: r'\xb0' with r'\260', I have tried it to do with:

test = u'\xb0C'
test = test.encode('latin1')
test = test.replace(r'\xb0', r'\260')

But it doesn't work. The problem is, that I must to write the data into a file in octal format (e.g. '\260C') and not in hex format etc.

alko
  • 46,136
  • 12
  • 94
  • 102
user2973395
  • 37
  • 1
  • 5
  • 1
    You don't want to replace `r'\xb0'` do you? You want to replace the *character*, not the sequence of 4 characters. `.replace('\xb0', r'\260')` would have been more appropriate. – Martijn Pieters Nov 09 '13 at 20:00

1 Answers1

2

Do you mean

>>> test.encode('unicode-escape').replace(r'\xb0', r'\260')
'\\260C'

or

>>> ''.join('\\%o' % ord(c) for c in test)
'\\260\\103'

or most generous approach (that turns out to be in fact requested by OP)

>>> table = {i: unicode(chr(i)) if 32 <= i < 128 else u'\\%o' % i for i in range(256)}
>>> u'\xb0ABD\260'.translate(table)
u'\\260ABD\\260'
alko
  • 46,136
  • 12
  • 94
  • 102
  • one more question, How can I avoid the double backslash (\\)? I want to get only one ;) – user2973395 Nov 09 '13 at 19:37
  • What do you mean by _remove backslash_? `\\260C` is merely a representation of a string `r'\260C'`; strings `\260C` and `\xb0C` equals: `'\260C' == '\xb0C'` outputs `True` – alko Nov 09 '13 at 19:44
  • My problem is these result here: m/s\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\262 because I check "all" possible variants: e.g. text = text.encode('unicode-escape').replace(r'xb0', r'260') text = text.encode('unicode-escape').replace(r'\xb2', r'\262') and so on, with each call will be added new backslash. I think my solution is bad. – user2973395 Nov 09 '13 at 20:00
  • You could have done without the unicode escaping. `.replace('\xb0', r'\260')` would have done just fine. – Martijn Pieters Nov 09 '13 at 20:00
  • @user2973395: What, `u'\xb0'.encode('latin1').replace('\xb0', r'\260')` works fine for me. Can you give us a test case instead? Sample input with expected output. – Martijn Pieters Nov 09 '13 at 20:13
  • @user2973395 i added third snippet to handle more cases, see update – alko Nov 09 '13 at 20:14
  • you're right MartijnPieters, my fault ;) because I took '\xb0' string without u. I think it's to late for me. But Thanks for the help. – user2973395 Nov 09 '13 at 20:22
  • @MartijnPieters with this string u'm/s\xb2' it doesn't work. alko: Thanks, i will try it. – user2973395 Nov 09 '13 at 20:34
  • @user2973395: That's because that is a different codepoint. If you need to handle different codepoints, then you'd need to use `translate()` or a regular expression. Moreover, you've underspecified your question then. – Martijn Pieters Nov 09 '13 at 20:35
  • @alko: which is why I mentioned it. :-) – Martijn Pieters Nov 09 '13 at 20:40
  • @MartijnPieters hmmm,It sounds a little bit complicated. – user2973395 Nov 09 '13 at 20:42
  • @user2973395: There is no silver bullet here. If you need to replace certain bytes with a non-standard encoding using octal escapes, you need to do so with the right tools. The answer here provides you with such a tool. – Martijn Pieters Nov 09 '13 at 20:44
  • @alko Your example throws exception ;) or I do something wrong. >>> table = {i: unicode(chr(i)) if 32 <= i < 128 else u'\\%o' % i for i in range(256)} File "", line 1 table = {i: unicode(chr(i)) if 32 <= i < 128 else u'\\%o' % i for i in range(256)} ^ SyntaxError: invalid syntax – user2973395 Nov 09 '13 at 20:46
  • http://stackoverflow.com/questions/1747817/python-create-a-dictionary-with-list-comprehension – alko Nov 09 '13 at 20:52
  • Ok, thanks. It rewrite the snippet for people like me ;) _table = {} for i in range(256): if 32 <= i < 128: _table[i] = unicode(chr(i)) else: _table[i] = u'\\%o' % i print u'\xb0ABD\260'.translate(_table) – user2973395 Nov 09 '13 at 21:09