0

I want to read eg. between 11th and 23rd number from hex .bin file looking like this: https://i.stack.imgur.com/9KJ1S.jpg, print some parts as intiger or other parts as a name (string). (preferably without using any [ ], only operations on pointers)

My example .bin file contains: first 4 hex numbers (blue highlight) is the length of name, then 2 numbers is name in ASCII. Next 4 numbers (blue underline) is the length of surname (red underline), and the last one is index.

My attempt:
After loading entire .bin file to buffer exactly like presented here: http://www.cplusplus.com/reference/cstdio/fread/ , I miserably tried in many ways to assign parts of this buffer to variables (or a structure) and then, printf it using formatting, just to see what got assigned.

 char *name_length = malloc(4);
 char *pEnd;
 for(*buffer=0; *buffer<4; *buffer++) {
     sscanf(buffer, "%s", name_length);
     long int i = strtol (buffer, &pEnd, 16);
     printf("%x", i);
 }

Above (wrong) code prints 0000 (I imagine it is completely rotten from it's roots, though I don't know why); in case there was an elegant way to load buffer parts already to structure, here's declaration:

 struct student_t
{
    char name[20];
    char surname[40];
    int index;
};

The "closest" result I Could get is another code, which prints "2000." from my .bin file: "02 00 00 46 2E" which means "2 0 0 0 /length/ F. /string/"

  for(int i=0; i<4; i++)
  printf("%d", buffer[i]); //it's supposed to print first 4 hex digits...
  for(int j=5; j<7; j++)
  printf("%s", &buffer[j]); //it's supposed to print from 5th to 7th...

Thanks a lot for all the help and guidance.

Immo
  • 19
  • 6

2 Answers2

0

sscanf() is not the correct tool to use for processing binary data like this.

You will get far better results working from something that looks like your last section of code, where you index each character in the buffer directly, and process it on a character by character basis.

Note that this is written assuming buffer is a pointer to characters, not a character array.

What you'll need to do is read four characters to get the length:

struct student_t result;
int length = 0;
int i;
// Progress backwards down data since it's stored "little endian"
for (i = 3; i >= 0; i--)
{
     length = (length << 8) + (buffer[i] & 255);
}

We've just consumed four bytes, move the buffer pointer forward to skip over them:

buffer += 4;

We have the length, and our buffer pointer now addresses the first character of the name. Read that many characters and save them:

for (i = 0; i < length; i++)
{
    result.name[i] = *buffer++;
}
// Add a NUL byte to terminate the string.
result.name[i] = '\0';

That'll read the name, and in so doing it's moved the buffer pointer to address the first byte of the next length value. All you then do is reset length to zero, and repeat the above to read in the surname.

dgnuff
  • 3,195
  • 2
  • 18
  • 32
0

Considering that I saved your exact binary data in a file called data.bin, here's an example:

code.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <errno.h>

#define FILE_NAME "data.bin"


typedef struct Record_ {
    uint32_t nameLen, surnameLen;
    char *name, *surname;
} Record;


void printRecord(Record record) {
    printf("\nPrinting record:\n  Name length: %u\n  Name: [", record.nameLen);
    if ((record.nameLen != 0) && (record.name != NULL)) {
        char *pc;
        for (pc = record.name; pc < record.name + record.nameLen; pc++) {
            printf("%c", *pc);
        }
    }
    printf("]\n  Surname length: %u\n  Surname: [", record.surnameLen);
    if ((record.surnameLen != 0) && (record.surname != NULL)) {
        char *pc;
        for (pc = record.surname; pc < record.surname + record.surnameLen; pc++) {
            printf("%c", *pc);
        }
    }
    printf("]\n");
}


void clearRecord(Record *pRecord) {
    free(pRecord->name);
    free(pRecord->surname);
    memset(pRecord, 0, sizeof(Record));
}


int readRecord(FILE *pFile, Record *pRecord) {
    size_t readBytes = fread(&pRecord->nameLen, sizeof(pRecord->nameLen), 1, pFile);
    if (pRecord->nameLen != 0) {
        pRecord->name = malloc(pRecord->nameLen);
        readBytes= fread(pRecord->name, 1, pRecord->nameLen, pFile);
    }
    readBytes = fread(&pRecord->surnameLen, sizeof(pRecord->surnameLen), 1, pFile);
    if (pRecord->surnameLen != 0) {
        pRecord->surname = malloc(pRecord->surnameLen);
        readBytes = fread(pRecord->surname, 1, pRecord->surnameLen, pFile);
    }
    return 0;
}


int main() {
    FILE *fp = fopen(FILE_NAME, "r+b");
    if (fp == NULL)
    {
        printf("Error opening file: %d\n", errno);
        return 1;
    }
    Record record = {0, 0, NULL, NULL};
    printRecord(record);
    int ret = readRecord(fp, &record);
    if (ret)
    {
        printf("readRecord returned %d\n", ret);
        fclose(fp);
        return 2;
    }
    printRecord(record);
    clearRecord(&record);
    fclose(fp);
    return 0;
}

Notes:

  • After loading entire .bin file to buffer exactly like presented here

    Usually, this is not a very good idea. Only read as much as you need. Imagine that you want to read 10 bytes from a file hundreds of MiBs large. That would be a complete waste of resources, and sometimes might even lead to crashes

  • It seems that you have a simple protocol here:

    1. 4 bytes for name length - this is an uint32_t
    2. A variable number of bytes given by the name length for name - this is a char *, as its length is not known at compile time (you could have an array like: char[SOME_MAX_NAME_LENGTH] where you know for sure that in the previous field there will never be a value greater than SOME_MAX_NAME_LENGTH, but I like this approach better)
    3. Same thing from #1. applied for surname length
    4. Same thing from #2. applied for surname


    This maps over the Record structure (yes the member order is not important, only the initialization order). Things could be taken even further, since the data for surname is a duplicate of the one for name, there could have been an inner structure containing the name data, and Record to contain only an array with 2 elements of that structure.
    But even if things would be simpler that way (and also the code in functions would be shorter - without duplication), I didn't do it because it would probably be less obvious

  • printRecord - displays Record data in a user-friendly manner (you can notice the pointer logic here when printfing the characters individually)

  • clearRecord - frees the memory occupied by the char * members and initializes everything to 0

  • readRecord - reads data from file and populates the record

    • It does not have any error handling, since code is already pretty long. But you should always check and handle errors (function return codes: e.g. fread)
    • Be careful when reconstructing (integer) values from individual bytes, as you might get unexpected results due to endianness. Check [SO]: Python struct.pack() behavior (@CristiFati's answer) (or of course, Google) for more info on this topic
    • Read 4 bytes for the size, then (allocate and) read "size" bytes for the string (I might be wrong here, but I don't think that sscanf (functions family) is supposed to work with binary data (except for strings))

Output:

[cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q052085090]> gcc code.c -o code.exe && ./code.exe

Printing record:
  Name length: 0
  Name: []
  Surname length: 0
  Surname: []

Printing record:
  Name length: 2
  Name: [F.]
  Surname length: 13
  Surname: [MurrayAbraham]
CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • I'll just post cpp check report (though the method is very educating, thanks a lot!): `(warning) %d in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. (warning) %d in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. [invalidPrintfArgType_sint] (warning) Size of pointer 'pRecord' used instead of size of its data. This is likely to lead to a buffer overflow. You probably intend to write 'sizeof(*pRecord)'. [pointerSize] (style) Variable 'readBytes' is assigned a value that is never used. [unreadVariable]` Thanks a lot! – Immo Sep 01 '18 at 13:10
  • in function clearRecord(), there should be `memset(pRecord, 0, sizeof(*pRecord));` instead of "pRecord". (based on cppcheck) – Immo Sep 01 '18 at 13:24
  • Thank you for the note. It was a typo (that could lead to segfaults if `clearRecord` would be applied repeatedly on the same object). – CristiFati Sep 01 '18 at 15:20