4

When using file.write() with 'wb' flag does Python use big or litte endian, or sys.byteorder value ? how can i be sure that the endianness is not random, I am asking because I am mixing ASCII and binary data in the same file and for the binary data i use struct.pack() and force it to little endian, but I am not sure what happen to the ASCII data !

Edit 1: since the downvote, I'll explain more my question !

I am writing a file with ASCII and binary data, in a x86 PC, the file will be sent over the network to another computer witch is not x86, a PowerPC, witch is on Big-endian, how can I be sure that the data will be the same when parsed with the PowerPC ?

Edit 2: still using Python 2.7

e-nouri
  • 2,576
  • 1
  • 21
  • 36
  • Not really, I saw that question and he asks about the ASCII data, In my case, I am using 2 format, ASCII and binary, for the ASCII I don't use any packing with struct, for the binary data, I know the endianness since I force it to little. – e-nouri May 23 '14 at 14:14
  • I don't see how so-called ASCII data which is just ISO-8859-1 is any different from anything else you are writing to the file. – James Mills May 23 '14 at 14:16
  • Thanks, so how can I force it to little, because the files I am dealing with are created on an x86 machine and the other program is running on a Power PC – e-nouri May 23 '14 at 14:19
  • I said this in my answer, but... ASCII is a single byte encoding, there is no endianness to it. – woot May 23 '14 at 14:55

3 Answers3

6

For multibyte data, It follows the architecture of the machine by default. If you need it to work cross-platform, then you'll want to force it.

ASCII and UTF-8 are encoded as a single byte per character, so is it affected by the byte ordering? No.

Here is how to pack little < or big > endian:

import struct

struct.pack('<L', 1234)
'\xd2\x04\x00\x00'

struct.pack('>L', 1234)
'\x00\x00\x04\xd2'

You can also encode strings as big or little endian this way if you are using UTF-16, as an example:

s.encode('utf-16LE')
s.encode('utf-16BE')

UTF-8, ASCII do not have endianness since it is 1 byte per character.

woot
  • 7,406
  • 2
  • 36
  • 55
  • That is my question mate, so how can I force it to little, because the files I am dealing with are created on an x86 machine and the other program is running on a Power PC ! – e-nouri May 23 '14 at 14:19
  • Could you give an example of "multibyte data"? I suspect that you're assuming a language other than Python. –  May 23 '14 at 14:24
  • 1
    Perhaps not the right term for it, then. ASCII is stored in a single byte per character. UTF-16 is multibyte, as an example. – woot May 23 '14 at 14:38
  • Thank you, now it make sense, I should only care about the binary data endianness, since it is not a single byte coded data (2 and 4 bytes) – e-nouri May 23 '14 at 15:01
  • Well you care about anything that uses more than 1 byte. So a byte, ASCII and UTF-8 encoded strings are fine (1 byte per character). UTF-16+, 16-bit, 32-bit, 64-bit numbers needed proper packing. – woot May 23 '14 at 15:03
3

It uses sys.byteorder. So just:

import sys

if 'little' == sys.byteorder:
     # little
 else:
     # big
cchristelis
  • 1,985
  • 1
  • 13
  • 17
2

Note: I assume Python 3.

Endianness is not a concern when writing ASCII or byte strings. The order of the bytes is already set by the order in which those bytes occur in the ASCII/byte string. Endianness is a property of encodings that maps some value (e.g. a 16 bit integer or a Unicode code point) to several bytes. By the time you have a byte string, the endianness has already been decided and applied (by the source of the byte string).

If you were to write unicode strings to file not opened with b mode, the question depends on how those strings are encoded (they are necessarily encoded, because the file system only accept bytes). The encoding in turn depends on the file, and possibly on the locale or environment variables (e.g. for the default sys.stdout). When this causes problems, the problems extend beyond just endianness. However, your file is binary, so you can't write unicode directly anyway, you have to explicitly encode and decode. Do this with any fixed encoding and there won't be endianness issues, as an encoding's endianness is fixed and part of the definition of the encoding.