0

I want to know how to declare the exact size of storage in C , if I use array or do the memory allocation such as malloc , they all need to decide the size previously . In this situation , I will declare a very large size to prevent the overflow , but it still have probability to happened .

For example

If I want to split an text file to words , I need to declare a char ** to store the word string , but I can't know how much words will be split ?

If I want to read the file content into a array

I need to declare a large buffer to store

buffer = malloc(sizeof(char)*1000);

Any better or correct solutions? thanks

#include <stdio.h>
#include <stdlib.h>

void read_chars(char * file_name ,char * buffer);

int main(int argc ,char * argv[])
{
    char * buffer ;
    buffer = malloc(sizeof(char)*1000);
    read_chars(argv[1],buffer);
    printf("%s",buffer);
}

void read_chars(char * file_name ,char * buffer)
{
    FILE * input_file ;
    input_file = fopen(file_name,"r");
    int i = 0;
    char ch;
    while((ch = fgetc(input_file)) != EOF)
    {
        *(buffer+i) = ch;
        i++;
    }
    *(buffer+i) = '\0';
    fclose(input_file);
}
user2131116
  • 2,761
  • 6
  • 26
  • 33

2 Answers2

4

The point of a buffer is (usually) to be a fixed size and allow you to read data in chunks. If you are reading a file then you shouldn't hold it all in memory unless you know the size of the file and it's not too big.

Declare a buffer size, traditionally a power of two, like 2048, and read the file into it in chunks, then run your logic on the chunk each time you read a block. You then use constant memory, can read any size file, and don't have to guess.

A downside is that you may have issues working with items that overlap the boundaries of buffers. You may have to work harder to get your logic to work in these cases.

Alternatively look at mmap to virtually map the whole file into memory (you still have to know how big it is though! But you can get the files size up-front.).

Community
  • 1
  • 1
Joe
  • 46,419
  • 33
  • 155
  • 245
  • Concerning the idea of "get the files size up-front", does opening a file with `"r"`, prevent other program from appending data? (somehow changing the file's length.) – chux - Reinstate Monica Dec 16 '13 at 19:00
  • 1
    @chux: Depends on the OS and/or filesystem. IIRC Windows won't let you write to a file that's being used by another process, unless you both open it specifically with sharing permissions specified. Linux is typically a lot less strict in that regard. (Because of how most *nix filesystems work, in some cases you can even write to a file that's been deleted! Or, delete a file that another process is writing to.) – cHao Dec 17 '13 at 01:03
4

An answer after an accepted answer:

1) A classic attack on systems to day is buffer overrun. If your system can handle 1000 bytes, someone will try 1001. So rather than a solution that can deal with an arbitrarily large buffer, define an upper limit geared to the task. If one is looking for a "name", 1024 byte should work. See long name. This size should be easy to adjust should code need re-work. Longer values are likely attacks and need not get handled normally. They should be detected and declared invalid input instead.

2) Don't miss the forest from the trees. I found it interesting that OP code has a classic error. Should getc() return the legal value of 255 then assign it to ch, ch may compare to EOF and stop. In all this dicsussion about buffer size, the size for ch was too small.

// char ch;
int ch;
while((ch = fgetc(input_file)) != EOF)

3) read_chars() should have had the buffer size passed to it so the function could use that information: read_chars(argv[1], buffer, 1000).

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256