12

I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.

For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.


Clarification:

My string looks something like this:

binary_string = "001011010110000010010"

It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.

duhaime
  • 25,611
  • 17
  • 169
  • 224
KFox
  • 1,166
  • 3
  • 10
  • 35

5 Answers5

12

To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module

import struct

#...
with open('file.dat', 'wb') as f:
    if isinstance(value, int):
        f.write(struct.pack('i', value)) # write an int
    elif isinstance(value, str):
        f.write(value) # write a string
    else:
        raise TypeError('Can only write str or int')

However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s

>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'

but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:

>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'
Iron Fist
  • 10,739
  • 2
  • 18
  • 34
Ryan Haining
  • 35,360
  • 15
  • 114
  • 174
  • This is not what I want. I edited the question to show more specifically what I need to do. – KFox Jun 02 '13 at 22:49
  • @KFox so you're saying you want a function that: given either an `int` or a `str` of 0s and 1s, you want to write that to a file as a binary representation of an int rather than a string of 0/1 characters. I'm not sure what you mean by "when I only need 3." do you mean you only use 24 bits in the largest value? – Ryan Haining Jun 02 '13 at 23:03
  • Sorry, it _was_ rather unclear. I've found the answer now anyway. Thanks for your response though. – KFox Jun 03 '13 at 00:14
8

Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.

The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.

from array import *

bin_array = array('B')

bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))

with file('binary.mydata', 'wb') as f:
    bin_array.tofile(f)
Ryan Haining
  • 35,360
  • 15
  • 114
  • 174
KFox
  • 1,166
  • 3
  • 10
  • 35
4

I want that pattern of ones and zeros to be written to a file.

If you mean you want to write a bitstream from a string to a file, you'll need something like this...

from cStringIO import StringIO

s = "001011010110000010010"
sio = StringIO(s)

f = open('outfile', 'wb')

while 1:
    # Grab the next 8 bits
    b = sio.read(8)

    # Bail if we hit EOF
    if not b:
        break

    # If we got fewer than 8 bits, pad with zeroes on the right
    if len(b) < 8:
        b = b + '0' * (8 - len(b))

    # Convert to int
    i = int(b, 2)

    # Convert to char
    c = chr(i)

    # Write
    f.write(c)

f.close()

...for which xxd -b outfile shows...

0000000: 00101101 01100000 10010000                             -`.
Aya
  • 39,884
  • 6
  • 55
  • 55
  • This breaks when the 8 bit chunk starts with "1" because then the utf-8 encoding is [longer](https://stackoverflow.com/a/33349765/6937913). This can be avoided in Python 3, [using to_bytes](https://gist.github.com/gngdb/3781ec8cba30769f881e9f9cbd54ed36). – gngdb Aug 27 '18 at 10:48
2

Brief example:

my_number = 1234
with open('myfile', 'wb') as file_handle:
    file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
    my_number_back = struct.unpack('i', file_handle.read())[0]
ThorSummoner
  • 16,657
  • 15
  • 135
  • 147
0

Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.

It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.

Python 3.4 example

original_string = '001011010110000010010'

# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]

# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]

with open('byte.dat', 'wb') as f:
    f.write(bytearray(byte_list))  # convert to bytearray before writing

Contents of byte.dat:

  • hex: 2D 60 12
  • binary (by 8 bits): 00101101 01100000 00010010
  • binary (by 3 bits): 001 011 010 110 000 000 010 010

                                        ^^ ^ (Note extra bits)
    

    Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.