11

File content:

40 13 123
89 123 2223
4  12  0

I need to store the whole .txt file as a binary array so that I can send it later to the server side which expects a binary input.


I've looked at Python's bytearray documentation. I quote:

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.


My numbers are greater than 256, I need a bytearray data structure for numbers that are greater than 256.

Tony Tannous
  • 14,154
  • 10
  • 50
  • 86
  • Do you mean you want the text representation stored as an array of int32? – xtofl Mar 10 '17 at 09:21
  • 1
    @xtofl yes. But my problem is that after I do so to each number I would like to have it in a binary object ? if I access first line I get the first number in binary representation. – Tony Tannous Mar 10 '17 at 09:23
  • 1
    Do you have an example of what you want exactly? `"101010"` isn't a binary object, it's a string representing 42 in binary. `42`, as an integer, is already stored as binary for Python. – Eric Duminil Mar 10 '17 at 09:24
  • 1
    @EricDuminil yes sir, sorry for my bad explenation. A byte is 8 bits, and it can be send as `binary` data. I need to have a sequence of many numbers in binary so that I know when to stop reading to know my first number, second number and so on. One way is as xtofl said to represent in 32 bits. But I can't make bytearray store more than 8 bits as anynumber greater than 256 can't be stored in. – Tony Tannous Mar 10 '17 at 09:36
  • 1
    So just use an int array and be done with it. Doesn't the server specify exactly which format it expects? – Eric Duminil Mar 10 '17 at 09:50
  • @EricDuminil the server does specify: `void func(1:binary message)` A `binary`. Looking in thrift types, binary: A byte array. – Tony Tannous Mar 10 '17 at 09:53

5 Answers5

8

you might use the array/memoryview approach

import array
a = array.array('h', [10, 20, 300]) #assume that the input are short signed integers
memv = memoryview(a)
m = memv.cast('b') #cast to bytes
m.tolist()

this then gives [10, 0, 20, 0, 44, 1]

Depending on the usage, one might also do:

L = array.array('h', [10, 20, 300]).tostring()
list(map(ord, list(L)))

this also gives [10, 0, 20, 0, 44, 1]

ewcz
  • 12,819
  • 1
  • 25
  • 47
  • Nice! I see there is also `array.from_list(...)`. – xtofl Mar 10 '17 at 09:29
  • `TypeError: cannot make memory view because object does not have the buffer interface` I read that array.array object supports this only on python 3 ? http://stackoverflow.com/questions/4877866/why-is-it-not-possible-to-get-a-py-buffer-from-an-array-object – Tony Tannous Mar 10 '17 at 10:02
  • 2
    @xtofl it works fine on Python 3, but unfortunately it looks like applying memoryview on an array is not supported by Python 2.7 - http://bugs.python.org/issue17145 – ewcz Mar 10 '17 at 10:02
  • @TonyTannous then I would just change `'h'` to `'d'`, i.e., replace short integer with integer... – ewcz Mar 10 '17 at 10:20
  • I removed comment. It works now. Just give me a couple of minutes please. Perfect! – Tony Tannous Mar 10 '17 at 10:21
  • One question sir, a string is a binary=byte array ? – Tony Tannous Mar 10 '17 at 11:14
  • @TonyTannous you can construct one as, e.g., `bytearray(array.array('b', [10, 20, 30]).tostring())` – ewcz Mar 10 '17 at 11:19
  • It worked though with string alone. So I wondered why :) it works no need to do further changes. Thanks! – Tony Tannous Mar 10 '17 at 12:06
3

You can read in the text file and convert each 'word' to an int:

with open(the_file, 'r') as f:
    lines = f.read_lines()
    numbers = [int(w) for line in lines for w in line.split()]

Then you have to pack numbers into a binary array with struct:

binary_representation = struct.pack("{}i".format(len(numbers)), *numbers)

If you want these data to be written in binary format, you have to specify so when opening the target file:

with open(target_file, 'wb') as f:
   f.write(binary_representation)
xtofl
  • 40,723
  • 12
  • 105
  • 192
  • I agree that this double list comprehension syntax would be more readable, but unfortunately, it doesn't work. Also, if you iterate on a string, you get the characters, not the words. – Eric Duminil Mar 10 '17 at 09:43
  • Busted. It's the other way around. Thanks – xtofl Mar 10 '17 at 11:00
2

Not bytearray

From the bytearray documentation, it is just a sequence of integers in the range 0 <= x < 256.

As an example, you can initialize it like this :

bytearray([40,13,123,89,123,4,12,0])
# bytearray(b'(\r{Y{\x04\x0c\x00')

Since integers are already stored in binary, you don't need to convert anything.

Your problem now becomes : what do you want to do with 2223 ?

>>> bytearray([2223])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)

uint32 or int32 array?

To read one file, you could use :

import re
with open('test.txt') as f:
    numbers = [int(w) for line in f for w in re.split(' +', line)]
    print numbers
    #[40, 13, 123, 89, 123, 2223, 4, 12, 0]

Once you have an integer list, you could choose the corresponding low-level Numpy data structure, possibly uint32 or int32.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
1

I needed this for a server-client module, which one of its function required a binary input. Different thrift types can be found here.

Client

myList = [5, 999, 430, 0]
binL = array.array('l', myList).tostring()
# call function with binL as parameter

In Server I reconstructed the list

k = list(array.array('l', binL))
print(k)
[5, 999, 430, 0]
Tony Tannous
  • 14,154
  • 10
  • 50
  • 86
0

Try this:

input.txt:

40 13 123
89 123 2223
4  12  0

Code to parse input to output:

with open('input.txt', 'r') as _in:
    nums = map(bin, map(int, _in.read().split())) # read in the whole file, split it into a list of strings, then convert to integer, the convert to binary string

with open('output.txt', 'w') as out:
          out.writelines(map(lambda b: b + '\n', map(lambda n: n.replace('0b', ''), nums))) # remove the `0b` head from the binstrings, then append `\n` to every string in the list, then write to file

output.txt:

101000
1101
1111011
1011001
1111011
100010101111
100
1100
0

Hope it helps.

Szabolcs
  • 3,990
  • 18
  • 38
  • Thanks, but I don't want to write it to a new file as binary, I need to hold it in a binary-object. Like bytearray or so. But I appreciate your effort. Thanks. – Tony Tannous Mar 10 '17 at 09:20
  • @TonyTannous: Then your question doesn't make sense, and it looks like you don't know exactly what you want to send. – Eric Duminil Mar 10 '17 at 09:21
  • 1
    @EricDuminil he knows, but does not have the proper terms for it. – xtofl Mar 10 '17 at 09:22
  • 1
    @TonyTannous Then just use a list of `int`s. Later you can convert it to anything you want :D – Szabolcs Mar 10 '17 at 09:25