How do you Convert Characters to Bits using a Custom Encoding Format?

Question

I've created an encryption-decryption module that works as follows:

1) It takes in a .txt file and reads the text inside

2) Using my encryption algorithm, it converts the original text (OT) to a unique code of 1's and 0's string characters.

3) It then takes the 1's and 0's string characters, encodes them to bytes with the standard format (which writes 8 bits for every 1 or 0) and exports the bytes sequence as a file.

4) It then calls the second script in the module, which reads the bytes file, and opens it as 1's and 0's characters again.

5) Finally, using my decryption algorithm, it takes the 1's and 0's and successfully translates them back to the Original Text.

The problem though is with step 3 - I'm unfortunately very new to Python and Comp Sci, so apologies if I garble this explanation:

I've realised that after converting the OT to 1's and 0's characters, instead of converting them to bytes with the usual codec format (i.e. 8-bits per character, on average) , I need it to write the actual bit of '1' for every 1 character in the code, and likewise the bit of '0' for every 0 character - for which I presume I'll need some sort of 'custom codec'.

What you get then is a bytes file that has exactly the same number of bits as the number of 1's and 0's characters in the coded sequence, instead of 8 bits per character as before. I'll then do some file measurement on this output and the OriginalText.txt file to make sure my output has a fewer number of bytes (i.e. smaller file size) than the OT.

Finally of course, the second script takes in this bytes file, translates it back to the same 1's and 0's sequence using my 'custom encoding-decoding format' and then uses my decryption algorithm to turn this into my original message.

Unfortunately, I'm not sure how I could create my own 'custom codec' and instruct Python to write me some sort of bit sequence in this way?

Further, say perhaps I manage to achieve the above and get my 'custom bit sequence', and that I have it stored in some kind of array (or other object) of bits (bitArray). Before I write it to the output file, I need to put my bitArray plus a couple other objects into a list. This list is actually the thing that needs to be converted to binary and then exported as a file - for which I'm using pickle, and have done for all my original char-binary/binary-char conversion - leaving the binary inside bitArray unaltered.

Further, when pickle converts the output file back into my list object, the bitArray and its 'custom binary' needs to arrive totally unmodified.

So, on top of not knowing how to write this custom codec, I'm assuming pickle can handle the above as I need, but of course could be totally wrong.

Is there a way I might achieve what I need, and am I right in assuming pickle won't mangle the already-bits inside my bitArray?

Thanks very much indeed,

Really appreciate any advice!

There is a contradiction here: *"which writes 8 bits for every 1 or 0"*, versus *"the same number of bits as the number of 1's and 0's"*. Is it 8 bits per 0/1, or is it 1 bit per 0/1? If the latter, how do you plan to deal with the left over when you cannot fill a whole byte with bits? — trincot, Feb 21 '18 at 09:02
@trincot Unfortunately, that's the problem: if I take say a str of a single digit, '1', and tell pickle to convert it in the standard way, it'll give an output file 1 byte in size (i.e. 8-bits). But what I need is for my script, before we pickle, to take a single chr '1' and instead output a single _bit_ (whose output file would thus be 1/8th of a byte in size). A list of chr 1's and 0's would give me then an output of 1 bit per chr, stored in a list or tuple. I would then take this output object, package it inside a list, and convert the whole list with pickle. Hopefully. — TheRealPaulMcCartney, Feb 21 '18 at 09:10
Suppose that you could make it work, what will you do when the number of bits is not a multiple of 8? Where will you store that information, so that when decoding the bits-file you will not decode a multiple of 8 bits? — trincot, Feb 21 '18 at 09:14
Note that the pickle data format uses a printable ASCII representation, so it is not your desired format. — trincot, Feb 21 '18 at 09:16
Well, I'm hoping: when pickle converts my list of objects (including my bitArray) to binary outputs it to a file, it leaves a 'note' for itself that says "The elements of bitArray were _already_ binary when we found them, so we didn't 'convert them again' and left them as bits. Conversely when unpickling this whole binary file, _leave_ the elements in bitArray as singular bits, as that's how we originally found." So after unpickling, my 2nd script will take the elements of bitArray, and convert to 1's and 0's by reversing the 'custom codec' my 1st script used to go from 1/0 -> single bit. — TheRealPaulMcCartney, Feb 21 '18 at 09:32
Like I said, if you want a file with 0 and 1 bits, where every bit corresponds to your original 0 and 1s, then pickle is not your tool, because pickle writes *printable* characters. Now it all depends on what your hard requirements are? Maybe you actually don't really need the file to have 1 bit per original 0/1, but just need a reasonable compression? What exactly is you higher level need? Why reinvent the wheel (https://docs.python.org/3/library/archiving.html)? — trincot, Feb 21 '18 at 09:38
@trincot Ah right, ok. So, if pickle encounters something that is _already_ in bits, amongst other perfectly handleable python objects, it won't know what to do with those bits and will cause an error, or something, rather than just outputting them as unmodified/unconverted bits? Unfortunately I was told for this project I have to use pickle (as far as I understand) so I'll have to go back to them and pitch this all to them, it looks like. — TheRealPaulMcCartney, Feb 21 '18 at 09:42
Sure, pickle can handle it all, but it will *produce* printable characters, even for non-character data. This seems contrary to your requirements. — trincot, Feb 21 '18 at 09:44
Well, the reason I have to have 'a bit per 1 or 0' in this way is due to a combo being kinda instructed to do it that way, and because to do otherwise I'd have to massively re-engineer the rest of my code. As for the output - if e.g. I gave pickle say a regular list with 4 bits inside as elements put their in my special way (i.e. each bit maps to 1 or 0) pickle it, then unpickle it - after rebuilding the list, when handling its bit elements, if pickle then attempts to convert those 4 bits into some kind of character, rather than just leaving them alone, then yeah that would not be what I want? — TheRealPaulMcCartney, Feb 21 '18 at 13:06
Perhaps this will help: https://stackoverflow.com/questions/41666947/passing-a-sequence-of-bits-to-a-file-python?noredirect=1&lq=1 — samgak, Feb 22 '18 at 08:40

How do you Convert Characters to Bits using a Custom Encoding Format?

0 Answers0