-1

I have a large text file, (3 to 6 Gb) of only two ASCII characters. I would like to convert this string into a boolean output which can be written as a simple binary file.

Take the toy 'test.bin' file below which is 568 bytes ASCII. There are 70*8, 560 characters. Every '0' and '1' is a character encoded by 1 byte. I'd like the final output to be reduced to a 560 bit file (70 byte) file.

0111000110000000101000100000100100011111010010101000001001010000111000
1001100011010100001101110000100010000010000000000001011000010011111100
0100001000010000010000010111011101011111000111111000111001100010100011
0011101000100001111111000001111110111111101101100000011000010101100001
0000000110110001000000000001000011110100000101101000001000010001010011
1101101111010101011110001110000010011001100101101101000111111101110101
1000001100101101010111110111110101100000000011001000100000000011001110
0101101001110010011110000100101001001111010011100100001001111111100110
...

I've found several solutions going the other way, converting a binary file into ASCII but not the other way, or incorrectly expanding the binary characters into their ASCII encoding 1 --> 0011001, 0 --> 0011000. I found a C++ solution, but I'm looking for a simple bash or python script.

=====================================================

Bash solution based on a small correction from here

cat test.bin | tr -d '[\n]' | perl -lpe '$_=pack"B*",$_' > true_binary.txt
Community
  • 1
  • 1
Artem
  • 217
  • 2
  • 10
  • If you found a C++ solution already, why not try to translate that over to Python yourself? Or if you're not interested in learning Python, why not just use the C++ solution? – blacksite Jan 04 '17 at 00:50
  • 1
    It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (output, tracebacks, etc.). The more detail you provide, the more answers you are likely to receive. Check the [FAQ](http://stackoverflow.com/tour) and [How to Ask](http://stackoverflow.com/questions/how-to-ask). – TigerhawkT3 Jan 04 '17 at 00:51
  • Can I use C++ functions in python? The C++ solution seems to be for one particular string, I'm not sure how I would expand that to a large file. If it was in python I'd be able to work with it, but I don't understand how that other solution works. I've been trying to solve this for the last 3 hours, it's just outside of any coding I usually do. – Artem Jan 04 '17 at 00:53

1 Answers1

0

I think this is probably better as a comment, but I apparently don't have that privilege yet. If you're on Python 3, this seems like a good relevant solution for you: https://stackoverflow.com/a/21220966/7006570

The difference is that in that question, the asker wants it in reverse order, but you don't, so ignore the [::-1] part. Then you end up with a bytes object, which you can write to a file. And of course, the length won't always be 4 bytes for you.

bitstring = "10111111111111111011110"
bits = int(bitstring, 2)
bytes_ = bits.to_bytes((bits.bit_length()+7)//8, 'little')

and then to save it

with open('/tmp/output', 'wb') as f:
    f.write(bytes_)
heemayl
  • 39,294
  • 7
  • 70
  • 76
Kevin Wang
  • 2,673
  • 2
  • 10
  • 18