2

can anyone help me optimalize code for reading standard input. Here it is what I have now:

unsigned char *msg;
size_t msgBytes = 0;
size_t inputMsgBuffLen = 1024;
if ( (msg = (unsigned char *) malloc(sizeof(unsigned char) * inputMsgBuffLen) ) == NULL ) {
    quitErr("Couldn't allocate memmory!", EXIT_FAILURE);
}
for (int c; (c = getchar()) != EOF; msgBytes++) {
    if (msgBytes >= (inputMsgBuffLen)) {
        inputMsgBuffLen <<= 1;
        if ( ( msg = (unsigned char *)realloc(msg, sizeof(unsigned char) * inputMsgBuffLen) ) == NULL) {
            free(msg);
            quitErr("Couldn't allocate more memmory!", EXIT_FAILURE);
        }
    }
    msg[msgBytes] = (unsigned char)c;
}
Donato Szilagyi
  • 4,279
  • 4
  • 36
  • 53
pierre tautou
  • 807
  • 2
  • 20
  • 37
  • 3
    Don't cast the return value of `malloc()` or `realloc()`. It serves no useful purpose and can hide errors. `sizeof (unsigned char)` is, by definition, `1`. Remove all multiplications by it. – pmg Apr 12 '11 at 17:10
  • 1
    If realloc fails and msg is null, then the following free() is a no-op, and you have a memory leak (you don't free the original msg). Since you are immediately exiting, you probably don't care. – William Pursell Apr 12 '11 at 17:11
  • What sort of optimization are you looking for? Why do you need it? What are your requirements for what the code is supposed to do? Do you really need to read the whole file into memory? – David Thornley Apr 12 '11 at 17:12
  • Is this going to be reading file pipes or user input? – Albert Perrien Apr 12 '11 at 17:38

3 Answers3

5

Question: are you reading binary or text data from stdin? If text, why are you using unsigned char?

Some advice:

  1. Drop all the casts on malloc and realloc; they aren't necessary and clutter up the code;
  2. Instead of repeatedly calling getchar, use fread or fgets (depending on whether you're reading binary or text);
  3. Remember that realloc can potentially return NULL, so you want to assign the result to a temporary value, otherwise you'll lose track of the original pointer and wind up leaking memory;
  4. Use a statically allocated buffer for each chunk of input;
  5. Use sizeof on objects, not types; it's a little cleaner, and it protects you in case the types change (e.g., T *p = malloc(sizeof *p * number_of_elements);.

Cleaned-up version assuming you intend to use unsigned chars:

#define inputBufSize 1024

unsigned char *msg = NULL;
size_t msgBytes = 0;
size_t inputMsgBufSize = 0;
unsigned char inputBuffer[inputBufSize];
size_t bytesRead = 0;

while ((bytesRead = fread(
    inputBuffer,            // target buffer
    sizeof inputBuffer,     // number of bytes in buffer
    1,                      // number of buffer-sized elements to read
    stdin)) > 0)
{
  unsigned char *tmp = realloc(msg, inputMsgBufSize + bytesRead));
  if (tmp)
  {
    msg = tmp;
    memmove(&msg[inputMsgBufSize], inputBuffer, bytesRead);
    inputMsgBufSize += bytesRead;
  }
  else
  {
    printf("Ran out of memory\n");
    free(msg);
    break;
  }
}
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • This one is a proper solution. Vote up. – Athabaska Dick Apr 12 '11 at 17:50
  • I try your code, and it is not working for me. I compile it and at cmd line put **~# echo 123 | ./a.out** The condition on while (bytesRead) was 0, so program quit! – pierre tautou Apr 12 '11 at 18:08
  • @pierre: try running it as `./a.out < 123` (I'm not used to working with pipes). I didn't intend this to be a complete solution, just as an example of one way to do this sort of thing. – John Bode Apr 12 '11 at 18:47
2

Try to read fixed chunks of at least 8192 bytes. Don't use single char input since it's quite slow.

piotr
  • 5,657
  • 1
  • 35
  • 60
0

Why do you want to "optimalize" the code?

Did you time it?
Did you find it was too slow?
Are you ready to time the new versions?
Do you realize timing run time of code is dependent of many many factors (like current processor load, number of active users, disk activity, ..., ...)

The best optimization you can do, is start with a very large value for malloc (and possibly realloc down after all data has been read).

size_t inputMsgBuffLen = 400000000; /* approx 400 mega */
pmg
  • 106,608
  • 13
  • 126
  • 198
  • Yes I did. I have preformance: 76.355209 [MB/s]. I try to change size of input buffer, but, the performance is the same. Is there better way to read the data from stdin to buffer like by character? I think read stdin by chunk of data? – pierre tautou Apr 12 '11 at 17:48