0

I have a binary data file that I am trying to read. The values in the file are 8-bit unsigned integers, with "record" delimiters that are ASCII text ($MSG, $GRP, for example). I read the data as one big chunk, as follows:

unsigned char *inBuff = (unsigned char*)malloc(file_size*sizeof(unsigned char));  
result = fread(inBuff, sizeof(unsigned char), file_size, pFile);

I need to search this array to find records that start with $GRP (so I can then read the data that follows), can someone suggest a good way to do this? I have tried several things, and none of them have worked. For example, my most recent attempt was:

std::stringstream str1;
str1 << inBuff;
std::string strTxt = str1.str();

However, when I check the length on this, it is only 5. I looked at the file in Notepad, and noticed that the sixth character is a NULL. So it seems like it is cutting off there because of the NULL. Any ideas?

Jongware
  • 22,200
  • 8
  • 54
  • 100
  • 4
    Don't mix C and C++. Go with `fstream` and read your file into a `std::string` directly. And if you still want/need to stick with your C code, [don't cast the result of malloc](http://stackoverflow.com/a/605858/3460805). – Chnossos May 01 '14 at 20:09
  • Don't mix C and C++. Go with `for (i=0; i – user3386109 May 01 '14 at 20:15
  • Can you provide some sample code and sample files? Sounds easy enough to implement via `strstr()` and `sscanf()`. – Cloud May 01 '14 at 20:15
  • 1
    @Dogbert Given that the file contains 8-bit unsigned integers, some of which are 0 (aka NULL), I'm pretty sure `strstr` is not a viable solution, and I question whether `std::string` will work either. From the problem description, what OP has is a binary file, not a simple text file. – user3386109 May 01 '14 at 20:21
  • @user3386109 It's a binary file effectively, containing binary and ASCII values (hybrid really). It wouldn't be hard to delimit though, and if we treat is as ASCII, we can find boundaries. – Cloud May 01 '14 at 20:23
  • @Dogbert: have to disagree with you. C `str` functions assume *zero-terminated* strings. Any C string function will stop at the very first binary 0. Use `memchr` to locate the $ and then use `strncmp` or `memcmp`. In particular, do not assume the byte immediately *after* the 4-byte identifier is a binary 0. – Jongware May 01 '14 at 21:00
  • @user3386109 I was thinking of something similar, but left out a lot of details in my earlier comment. In any case, I was aware that `strstr()` requires a trailing NULL character, but your point is valid nonetheless. – Cloud May 01 '14 at 22:12
  • thank you all for your suggestions! using memcmp has worked for me. I appreciate the help. – user3594029 May 02 '14 at 13:04

2 Answers2

0

Assuming the fread does not return a -1, the value in it will tell you how many bytes are available to search.

It is unreasonable to expect to be able to do a string search on binary data, as there my be NUL characters in the binary data which will cause the length function to terminate early.

One possibly way is to to search for the data is to use memcmp on the buffer, with your search key, and length of the search key.

EvilTeach
  • 28,120
  • 21
  • 85
  • 141
0

(As per my comment)

C str functions assume zero-terminated strings. Any C string function will stop at the very first binary 0. Use memchr to locate the $ and then use strncmp or memcmp. In particular, do not assume the byte immediately after the 4-byte identifier is a binary 0.

In code (C, not tested):

/* recordId should point to a simple string such as "$GRP" */
unsigned char *find_record (unsigned char *data, size_t max_length, char *recordId)
{
    unsigned char *ptr;
    size_t remaining_length;
    ptr = startOfData;

    if (strlen(recordId) > max_length)
        return NULL;

    remaining_length = max_length;
    do
    {
       /* fast scan for the first character only */
       ptr = memchr (ptr, recordId[0], remaining_length);
       if (!ptr)
          return NULL;

       /* first character matches, test entire string */
       if (!memcmp (ptr, recordId, strlen(recordId))
          return ptr;

       /* no match; test onwards from the next possible position */
       ptr++;

       /* take care not to overrun end of data */
       /* It's tempting to test
          remaining_length = ptr - startOfData;
          but there is a chance this will end up negative, and
          size_t does not like to be negative.
        */
       if (ptr >= startOfData+max_length)
           break;

       remaining_length = ptr-startOfData;
    } while (1);

    return NULL;
}
Jongware
  • 22,200
  • 8
  • 54
  • 100