1

I need to read a large Intel Hex file and based on data type, need to store the data in a string/character array to use later on. Below is the code, I am using chunk to read line from hex file, using data_type to check the data type in read line, sub to store parsed data from line and finaldata to keep adding data as I read. However the problem is size, the max character array size is 65535 (correct me if I am wrong) but my data is around 80,000 bytes (120K characters). How can I tackle this (using C language)? or it be better if I switch to C++ or C#? Thanks in advance for any help/insight you can provide.

Edit: Hex data from file looks like below: :020000040200F1 :10C00000814202D8BFF32F8F10BD441C42E8004366 I need to read this data line by line and based from data type (shown in bold, 04 in first line, 00 in second), if it's 00, parse the data from the next byte (byte after data type) and read until end except last byte (which is checksum). Then move to next line, if the data type is 00, parse the data and add it to previously read data (string concatenation), so the variable needs to store a big amount of final data (this is I where I am struggling, how to store that large amount of data in a single variable)?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    FILE *fp;
    fp = fopen(*filename, "rb");
    if(fp == NULL) {
        perror("Unable to open file!");
        exit(1);
    }

    char chunk[128];
    char sub[128];

    char finaldata[65535];
    finaldata[0] = '\0';
    // Store the chunks of text into a line buffer
    size_t len = sizeof(chunk);

    while(fgets(chunk, sizeof(chunk), fp) != NULL) {
        //fputs(chunk, stdout);
        int a=0;

        if((chunk[7] == '0') && (chunk[8] == '0')) {
            size_t length = strlen(chunk);

            while (a < (length-13)) {
                sub[a]=chunk[9+a];
                a++;

            }
        }
        strcat(finaldata, sub);
        fputs(finaldata, stdout);
        memset(sub,0,sizeof(sub));
         printf("\n\n");

    }

    fclose(fp);

    printf("\n\nMax line size: %zd\n", len);

    return 0;
}
Joe
  • 15
  • 3
  • 1
    *the max character array size is 65535 (correct me if I am wrong)* - yes, you are wrong. https://stackoverflow.com/questions/9386979/what-is-the-maximum-size-of-an-array-in-c – Eugene Sh. Jun 18 '20 at 15:10
  • 2
    when you need a large array do no place it in the stack, use dynamic memory allocation (`malloc`) – bruno Jun 18 '20 at 15:13
  • 1
    Yes, C++ would provide you with ready-to-go "container classes" which you might indeed appreciate very much. You *can* get the job done with "straight C" – using bruno's advice to use `malloc`, but it sure is handy to just "grab an appropriate container-class off the shelf" and know that it will work as advertised. "Because laziness is a virtue," I use C++ now in every case where I'd have used C. (And I use other, interpreted languages far more often.) – Mike Robinson Jun 18 '20 at 15:17
  • 2
    BTW - "80,000 bytes (120K characters)." -sounds weird. Is you "character" less than byte? – Eugene Sh. Jun 18 '20 at 15:17
  • `strcmp ("00", data_type)` has undefined behavior, the array length is only 2 and you set its 2 elements without a null terminating char. To do `strcat(finaldata, sub);` in your loop is expensive for nothing, at least save the end position/ptr each time, and why to use the intermediate array *sub* ? – bruno Jun 18 '20 at 15:18
  • from where 13 and 9 comes into `length-13` and `chunk[9+a]` ? Why to initialize *b* then increment it while you do not use it ? Perhaps you wanted `printf("\n\nMax line size: %d\n", b);` – bruno Jun 18 '20 at 15:23
  • @EugeneSh., thanks for the ino. – Joe Jun 18 '20 at 16:36
  • @MikeRobinson, thanks Mike, I'll give C++ a try – Joe Jun 18 '20 at 16:37
  • @bruno I tried dynamic memory allocation and I was getting errors so I switched to static and it executed fine, I'll switch to dynamic mem allocation and try again. I don't have much experience with C, that's very visible the way I handled strings (as you mentioned in your comment) and regarding 13 and 9, I am parsing data from one line that's being read from file. And the variable b, I was using it for something else that I removed from my code but forgot to remove this variable, thanks for pointing that out. – Joe Jun 18 '20 at 16:44
  • @Joe as I already said you have (at least) a problem with `data_type`, may be the fact you use a dynamic allocated array reveal the undefined behavior more, you need to size it 3 and to do `data_type[2] = 0;` or better you remove that variable and its use and replace the test to be `if((chunk[7] == '0') && (chunk[8] == '0')) {` – bruno Jun 18 '20 at 16:53
  • @bruno I got your point, I'll go with the if statement that you mentioned in previous comment. One question, when defining char array using malloc() function, what size should I put in there since I don't know the exact size? – Joe Jun 18 '20 at 17:10
  • @Joe you can use `realloc` to increase the size of the array when it is too small. Can you edit your question to say more about the input file content and what you want to do with ? – bruno Jun 18 '20 at 18:34
  • @Joe I put an answer with a proposal using `realloc` – bruno Jun 19 '20 at 15:06

1 Answers1

1

You say :

read until end except last byte (which is checksum)

but if I apply on :10C00000814202D8BFF32F8F10BD441C42E8004366 your code

    int a=0;

    if((chunk[7] == '0') && (chunk[8] == '0')) {
        size_t length = strlen(chunk);

        while (a < (length-13)) {
            sub[a]=chunk[9+a];
            a++;
        }
    }

sub values 814202D8BFF32F8F10BD441C42E8004 so you remove 366 at the end of the line rather than only 66


From your remark

when defining char array using malloc() function, what size should I put in there since I don't know the exact size?

If you want to collapse all the sub strings in one string, one way is to start by an array of size 1 for the null terminating char then to increase it line per line using malloc. For instance :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char ** argv)
{
  if (argc != 2) {
    fprintf(stderr, "Usage: %s <file>\n", *argv);
    exit(1);
  }

  FILE *fp = fopen(argv[1], "rb");

  if (fp == NULL) {
    perror("Unable to open file!");
    exit(1);
  }

  size_t sz = 0; /* without counting the char for \n */
  char * finaldata = malloc(1);
  char chunk[128];

  while (fscanf(fp, " %127s", chunk) == 1) {
    if((chunk[7] == '0') && (chunk[8] == '0')) {
      if (strlen(chunk) != 43) {
        fprintf(stderr, "unexpected line '%s'\n", chunk);
        exit(1);
      }

      chunk[41] = 0; /* remove two last chars */

      char * s = realloc(finaldata, sz + 32 +1); /* + block + \n */

      if (s == NULL) {
        fputs("not enough memory", stderr);
        free(finaldata); /* for valgrind etc */
        exit(1);
      }

      finaldata = s;
      strcpy(finaldata + sz, chunk + 9);
      sz += 32;
    }
  }

  fclose(fp);
  finaldata[sz] = '\0';

  /* debug */
  puts(finaldata);

  free(finaldata); /* for valgrind etc */

  return 0;
}

I use fscanf to bypass possible spaces including newline before and after the part to manage. In the format " %127s" notice the space before '%', and 127 which is 128 minus 1 to let place for the null terminating char.

Compilation and execution :

pi@raspberrypi:/tmp $ gcc -Wall c.c
pi@raspberrypi:/tmp $ cat f
:020000040200F1
:10C00000814202D8BFF32F8F10BD441C42E8004366
:020000040200F1
:10C00000123456789abcdef0123456789abcdef012
pi@raspberrypi:/tmp $ ./a.out f
814202D8BFF32F8F10BD441C42E80043123456789abcdef0123456789abcdef0
pi@raspberrypi:/tmp $ 
bruno
  • 32,421
  • 7
  • 25
  • 37
  • Thanks for the detailed answer, I got the dynamic memory allocation now. I have one question though, for "finaldata = s;" why does it only increased the size of "finaldata" string, why it didn't copy the contents of "s" (empty string) into "finaldata"? – Joe Jun 22 '20 at 12:50
  • about the var *s* it is to be 'clean' the non clean version is to directly do `finaldata = realloc(finaldata, sz + 32 +1);`. That form only increase the size of the *finaldata* so after it is needed to add the new part doing `strcpy(finaldata + sz, chunk + 9);` – bruno Jun 22 '20 at 12:57
  • `finaldata = s;` assign a pointer with an other one, this is not a deep copy. After the address saved in both variable is the same (a pointer is an address) – bruno Jun 22 '20 at 13:00
  • 1
    also I found out that there a line in hex file whose length is not 43 but is correct data type, so I created an integer which checks the length of "chunk" and then I use that variable to remove the checksum and correctly resize and read data. Thanks for your help, I'll mark it as answered. – Joe Jun 22 '20 at 13:03