1

So I have a binary file that I want to read that is structured such that there are 256 segments of the following structure:

  • First Byte: integer representing the length of the bits of the field following it that you need to read. It doesn't necessarily end at a byte boundary
  • Variable number of bits: The field you want to read. It doesn't necessarily end at a byte boundary

The file ends with 0's padded out so that it ends at a byte boundary.

I'm struggling to figure out an ideal method that involves reading as few individual bits as possible. I'm thinking of maybe reading the length first, dividing this value by 8 and reading that number of bytes next, and then using remainder of the previous operation (if any) to read the rest of the field bit by bit. I'm not sure if this is an ideal method however. Any suggestions?

Edit: Attached is a link to the files. The readable file is the format I would like to print out the binary file as. To take an example from the desired output:

  • length for 9c: 4
  • code for 9c: 1101

4 would be the first byte read from the binary file, and 1101 would be the variable number of bits

https://ln2.sync.com/dl/e85dc8b40/3f5wbhaq-kxz3ijv8-wuts3t32-442gbsh2

user1303
  • 53
  • 5
  • 2
    AFAIK it isn't possible to read less than one byte from a file. – kaylum May 25 '20 at 11:54
  • 1
    From the definition: segments start at a byte boundary; the bits also start at a byte boundary. So read the first byte, read the remainder of the segment (rounded up to bytes), do what you want with the bits (e.g. shift them right). – Paul Ogilvie May 25 '20 at 11:56
  • I was thinking of doing something along these lines but it doesn't seem like this method will be efficient overall: https://stackoverflow.com/a/11680278/13613024 – user1303 May 25 '20 at 11:57
  • I would just read the _First Byte_ (or the N _length bytes_), determine the number of bits to be read (M bits) and then go on reading from the file `ceil(M/8)` bytes. Then I would pass this char buffer to a converting function, that would sort it out "playing" conveniently with shifts & masks. – Roberto Caboni May 25 '20 at 11:58
  • With the process of dealing with the variable bit field, say you read into the bits belonging to the next segment, how would you deal with those bits so that you can find the correct length of the next segment then? – user1303 May 25 '20 at 12:15
  • Let's say you have a misalignment of 3 bits. `Byte(N) = (Source(N-1) << 3) + Source(N) & 0x07` (where 0x07 is b00000111 -> 3 bytes to 1). I'm assuming that all bytes in the data section have the same misalignment. Am I wrong?) – Roberto Caboni May 25 '20 at 12:23
  • "The file ends with 0's padded out so that it ends at a byte boundary." I strongly suspect each segment is also padded. Post a link that describes the encoding in detail. – chux - Reinstate Monica May 25 '20 at 12:42
  • `Any suggestions?` start with a *naive* implementation, fetching one bit at a time. This needs a 1-byte buffer + a count of the remaining bits inside this buffer. make it a structure. – wildplasser May 25 '20 at 12:56
  • @wildplasser Depending on what OP:s question mean in detail, a buffer of 32 bytes may be needed. – klutt May 25 '20 at 13:13
  • @RobertoCaboni Ah your code confuses me a little bit. What exactly is byte, source and n? And yes, since there is no padding between the segments (only at the very end of the file) , the data in that specific segment will also be affected too – user1303 May 26 '20 at 04:03
  • It means that the Nth output byte can be calculated manipulating Nth and (N-1)th source bytes. Edit your question with an example file and the desired output (for example "Hello world" and the corresponding encoded file) and I'll try to build an answer (me or any of all the other guys). Currently there are details I'm not sure about and I don't feel confortable in answering: an example is required. – Roberto Caboni May 26 '20 at 05:37
  • Ive added a link to the example files and explanation of the desired output! – user1303 May 26 '20 at 06:24
  • imagine a person arrives to your question in two years time, will you still have a link to your sync.com account then here? IOW better to write all info into the question. As to your question, I think your approach should work, at the end of the day you can only read full bytes so there will be bit fiddling involved. – AndersK May 26 '20 at 06:35
  • another way to approach this - if you have a max size is to use a struct with bitfields – AndersK May 26 '20 at 06:42
  • Please show a hex-representation of a simple example of an input file. – RubberBee Jun 02 '20 at 07:30

1 Answers1

1

The naive method works excellently (for small files) The input is actually completely unaligned, just a series of bits, without any padding.

[I'll delete this answer in 1 minute, because I dont want do do someone's homework]


#include <stdio.h>
#include <stdlib.h>

#define the_path "/home/Download/binary_file.dict"

struct bitfile {
        FILE *fp;
        unsigned char byte;
        unsigned char left;
        };

struct bitfile * bfopen(char *path)
{
struct bitfile *bp;

bp = malloc(sizeof *bp);
bp->fp = fopen(path, "rb" );
bp->byte = 0;
bp->left = 0;

return bp;
}
int bfclose(struct bitfile * bp)
{
int rc;
rc = fclose(bp->fp);
free(bp);
return rc;
}

int bfgetb(struct bitfile * bp)
{
int ch;
if (!bp->left) {
        ch = fgetc(bp->fp);
        if (ch < 0) return EOF;
        bp->byte = ch;
        bp->left = 8;
        }
bp->left -= 1;
ch = bp->byte & (1u << bp->left) ? 1 : 0;
// bp->byte >>= 1;
return ch;
}

void bfflush(struct bitfile * bp)
{
bp->left =0;
}

unsigned bp_get_n( struct bitfile *bp, unsigned bitcount)
{
unsigned val=0;

while(bitcount--) {
        int ch;
        ch = bfgetb(bp);
        if (ch < 0) return EOF;
        val <<=1;
        val |= ch;
        }
return val;
}

int main(void)
{
struct bitfile *bp;
int ch;
unsigned iseg, ibit, nbit;

bp = bfopen( the_path);

for (iseg =0; iseg <16*16; iseg++) {
        // bfflush(bp);
        nbit = bp_get_n(bp, 8);
        fprintf(stdout, "Seg%u: %u bits\n", iseg, nbit);
        fprintf(stdout, "payload:");
        for (ibit=0; ibit < nbit; ibit++) {
                ch = bfgetb(bp);
                if (ch < 0) break;
                fputc( '0'+ ch, stdout);
                }
        fprintf(stdout, ".\n");
        }
bfclose(bp);
return 0;
}
wildplasser
  • 43,142
  • 8
  • 66
  • 109