5

I need to know the number of bytes in a 'word' in Python. The reason I need this is I have the number of words I need to read from a file; if I knew the number of bytes in a word, I can use the file.read(num_bytes) function to read the appropriate amount from the file.

How can I determine the number of bytes in a word?

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
jlconlin
  • 14,206
  • 22
  • 72
  • 105
  • 6
    Define 'word'. Are you referring to the unit of memory or the linguistic concept? – Rafe Kettler Aug 02 '11 at 20:40
  • I didn't specify. I was meaning the unit of memory, which I guess is undefined in Python as @TokenMacGuy states. – jlconlin Aug 02 '11 at 20:50
  • 4
    The number of bytes in a word should be determined by the file format, not by Python. You should look to the application that created the file. – Mark Ransom Aug 02 '11 at 20:51
  • @Jeremy TokenMacGuy is correct, there's no standard word in Python. Different files/platforms will behave differently. – Rafe Kettler Aug 02 '11 at 21:44
  • There is, however, a well defined concept for "address size", which the accepted answer reports, but probably has nothing at all to do with data that could be found in a file (unless something in `/sys` or `/proc` on linux systems with sysfs/procfs exposes binary address values) – SingleNegationElimination Aug 03 '11 at 05:09

6 Answers6

8

You can use the platform.architecture function:

>>> import platform
>>> platform.architecture()
('64bit', '')

Pay attention to the note on the same page:

Note On Mac OS X (and perhaps other platforms), executable files may be universal files containing multiple architectures. To get at the “64-bitness” of the current interpreter, it is more reliable to query the sys.maxsize attribute:

is_64bits = sys.maxsize > 2**32

Please keep in mind that this gives the word size with which the python interpreter was compiled. You could obtain a value of 32 on a 64bit host if python was compiled in 32bit mode.

If the file is produced by a different executable and you have access to this executable, you can use the first optional argument to the platform.architecture function:

>>> p.architecture('/path/to/executable')
('32bit', '')
GaretJax
  • 7,462
  • 1
  • 38
  • 47
1

There is no concept of 'word' in Python, when you read binary data from a file, you can state explicitly, how many bytes should be read at a time.

In terms of compiler and/or platform, 'WORD' generally determines a size of a basic data unit. And Python is independent from that kind of stuff :)

Zaur Nasibov
  • 22,280
  • 12
  • 56
  • 83
0

How about something like this:

def machine_word_size():
    import sys
    num_bytes = 0
    maxint = sys.maxint
    while maxint > 0:
        maxint = maxint >> 8
        num_bytes += 1
    return num_bytes
Sri
  • 184
  • 2
  • 8
0

Perhaps the following might be relevant and helpful: Suppose you checking for 32-bits. See if (-1)<<31 comes back as long or not. For 32-bit, it does not, while (-1)<<32 and 1<<31 do.

0

There's no really sound definition for what a word is; except that certain archetectures call some number of bytes 'word' (x86 calls 2 bytes a word, PPC calls 4 bytes a word), but there's not much significance besides this arbitrary value.

Perhaps the simplest solution is to just defer to the struct module; for instance, the format 'h' means signed short (which reasonably agrees with the intel definition of 'word'). So you could do this:

>>> import struct
>>> f = file('.vimrc')
>>> struct.unpack('h', f.read(struct.calcsize('h')))
(8226,)
>>> 
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
0

I need to know the number of bytes in a 'word' in Python. The reason I need this is I have the number of words I need to read from a file

Then you need to ask the person who wrote the file. It has nothing to do with Python and everything to do with what the actual file format is. It's pretty odd for a file to be defined as a sequence of words BTW. It is most probably a sequence of 16- or 32-bit integers, or else it really is words in the text sense, in which case you are really scanning the file for tokens between whatever the delimiters are.

user207421
  • 305,947
  • 44
  • 307
  • 483