2

i see a string in this code:

data[:2] == '\xff\xfe'

i don't know what '\xff\xfe' is,

so i want to escape it ,but not successful

import cgi
print cgi.escape('\xff\xfe')#print \xff\xfe

how can i get it.

thanks

sorin
  • 161,544
  • 178
  • 535
  • 806
zjm1126
  • 63,397
  • 81
  • 173
  • 221

4 Answers4

11

'\xFF' means the byte with the hex value FF. '\xff\xfe' is a byte-order mark: http://en.wikipedia.org/wiki/Byte_order_mark

You could also represent it as two separate characters but that probably won't tell you anything useful.

Tyler
  • 21,762
  • 11
  • 61
  • 90
2
>>> print '\xff\xfe'.encode('string-escape')
\xff\xfe
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
2

What is the connection between "i don't know what '\xff\xfe' is" and "so i want to escape it"? What is the purpose of "escaping" it?

It would help enormously if you gave a little more context than data[:2] == '\xff\xfe' (say a few line before and after) ... however it looks like it is testing whether the first two bytes of data could possibly represent an UTF-16 littleendian byte order mark. In that case you could do something like:

UTF16_LE_BOM = "\xff\xfe"

# much later
if data[:2] == UTF16_LE_BOM:
    do_something()
John Machin
  • 81,303
  • 11
  • 141
  • 189
-1

You cannot escape or encode an invalid string.

You should understand that you are working with strings and not byte streams and there are some characters you cannot accept in them, first of them being 0x00 - and also your example that is happening to be a BOM sequence.

So if you need to include non-valid strings characters (unicode or ascii) you will have to stop using strings for this.

Take a look at PEP-0358

Community
  • 1
  • 1
sorin
  • 161,544
  • 178
  • 535
  • 806
  • 3
    It would be a very good idea if you explain what is your definition of "invalid string" and in particular what is "invalid" about "\x00" or "\xff\xfe". Have you noted that the OP appears to be using Python 2.x and not 3.x and so PEP-0358 has little relevance? – John Machin Jan 01 '10 at 03:39
  • Example: you cannot store 0x00 inside a C string because this is the string terminator. In the case of Unicode there are several other codes that you are not allowed to store inside. – sorin Jan 02 '10 at 10:05
  • 1
    Have you noticed that the OP is using Python, not C? I ask again: What is invalid about "\xff\xfe"? – John Machin Jan 02 '10 at 21:53
  • Usually Python is using C strings because it is implemented in C. Now regarding the value range: if using ASCII you are allowed to use only 0..128 (ANSI is 0.255). A.so if you are using Unicode you are allowed to use a wider range of values but it happens that the two values specified to not be accepted. Why? Because if you are using ANSI instead of ASCII you'll discover that you may get different results from decode when the OS codepage is different. Take a look at MatrixFlog answer to see the meaning of 0xFFFE (can be used only at the beginning of the file). – sorin Jan 04 '10 at 15:26
  • 5
    When Python is implemented in C, it doesn't "use C strings". It uses C to implement Python strings, which have quite different semantics -- in particular "\x00" is quite legal. Your ASCII/ANSI stuff is irrelevant. MatrixFlog doesn't mention 0xFFFE, he mentions '\xff\xfe' which is NOT the same thing as 0xFFFE, is a LEGAL Python string and is POSSIBLY interpretable as a BOM (depends on an agreement that the file is encoded in UTF-16; the OP has NOT supplied that info). U+FEFF not at the start of UTF-16 file is a zero-width no-break space (quite legal). – John Machin Jan 05 '10 at 12:22
  • @JohnMachin, python 2.x packages that assume `str`s are UTF-8 (e.g. the `json` package) will raise UnicodeDecodeError on `json.dumps('\xff\xfe')`. Maybe that's why Sorin called this BOM-like byte sequence an "invalid string". – hobs Jan 03 '14 at 00:39