1

To learn some low level C magic, I tried to implement different methods to copy the content of one file to another.

The following method uses a buffer, where I save chunks of data, e.g. 512 bytes. Then I write FROM this buffer into my destination file. Allegedly this is more efficient than copying the whole file directly.

When I use a small buffer, e.g. 512 Bytes, this works fine. If I use a big buffer, e.g. 10000000 Bytes, roughly 10MB, I get a segmentation fault.

I am running on Ubuntu. I tried stuff with malloc, but nothing really seams to work for me. I guess the fix will be easy, once I know it.

According to my debbuger the error is in this line:

if (stat(to, &statBuffer) == -1)

I left in a lot of the error checking I am doing. Here is the code I run:

void copyUsingBuffer(char from[], char to[], long unsigned int bufferSize)
{
    struct stat statBuffer;
    int sourceFD, destFD;
    char buffer[bufferSize / sizeof(char)];
    int EndOfFileErreicht = 0;

    // Dateien pruefen
    if (stat(to, &statBuffer) == -1)
    {
        if (errno != ENOENT)
        {
            perror("Fehler beim Pruefen der Zieldatei");
            exit(EXIT_FAILURE);
        }
    }
    else
    {
        if (!(S_ISREG(statBuffer.st_mode)))
        {
            printf("Die Zieldatei ist keine regulaere Datei\n");
            exit(EXIT_FAILURE);
        }
    }

    // Dateien oeffnen
    sourceFD = open(from, O_RDONLY);
    if (sourceFD == -1)
    {
        perror("Fehler beim Oeffnen der Quelldatei");
        exit(EXIT_FAILURE);
    }
    if (fstat(sourceFD, &statBuffer) == -1)
    {
        perror("Fehler beim Pruefen der Quelldatei");
        exit(EXIT_FAILURE);
    }
    if (!(S_ISREG(statBuffer.st_mode)))
    {
        printf("Die Quelldatei ist keine regulaere Datei\n");
        exit(EXIT_FAILURE);
    }

    destFD = open(to, O_WRONLY | O_CREAT | O_TRUNC, statBuffer.st_mode);
    if (destFD == -1)
    {
        perror("Fehler beim Oeffnen der Zieldatei");
        exit(EXIT_FAILURE);
    }

    while (EndOfFileErreicht == 0)
    {
        int geleseneBytes = read(sourceFD, buffer, sizeof(buffer));
        if (geleseneBytes == -1)
        {
            perror("Fehler beim Lesen aus der Quelldatei");
            exit(EXIT_FAILURE);
        }
        if (geleseneBytes < bufferSize)
        {
            // EOF wurde erreicht, also Flag setzen
            EndOfFileErreicht = 1;
        }


        if (write(destFD, buffer, geleseneBytes) == -1)
        {
            perror("Fehler beim Schreiben in die Zieldatei");
            exit(EXIT_FAILURE);
        }
    }

    close(sourceFD);
    close(destFD);
}

I would expect that if I enter a 500MB text file, that my program copies this file to its destination.

I realise that there are different ways to implement file copying. I already implemented some of those. In this case I am restricted to using write, read, and the buffer. Also bufferSize NEEDS to be a variable. The naive way i have using malloc would be:

 char* buffer = malloc(bufferSize);

But when i use this the file will be corrupted.

Regicide
  • 73
  • 5
  • 5
    `char buffer[bufferSize / sizeof(char)];` - how much stack space do you think you have? – SergeyA Jun 25 '19 at 18:55
  • On a side note, I am not sure why people are saying this questions lacks MCVE. In my view, it does have correct MCVE. – SergeyA Jun 25 '19 at 18:56
  • You should wait till `read()` returns 0 before assuming you're at the end of the file. – Shawn Jun 25 '19 at 18:56
  • Are you exceeding your stack size limit? – Christian Gibbons Jun 25 '19 at 18:57
  • Is there a way for me to check how much stack space i have? i have 16GM of ram. – Regicide Jun 25 '19 at 18:58
  • 3
    I would use `malloc` instead of a VLA. – 001 Jun 25 '19 at 19:00
  • Stack space is generally very short. On a typical desktop PC you should expect only 1 MB to 10 MB of stack no matter how much RAM your system has. In other words, take @JohnnyMopp 's advice about using `malloc` – user4581301 Jun 25 '19 at 19:00
  • Is there a way for me to increase my stack space? So i can check if the problem really is that the file is too large? – Regicide Jun 25 '19 at 19:02
  • 1
    `bufferSize / sizeof(char)`... `sizeof(char)` is by definition 1, so that division is pointless. And, yes, unless `bufferSize` is always going to be a fairly small value, you'll probably want to use `malloc()` instead of a VLA. – Shawn Jun 25 '19 at 19:02
  • 1
    It is not the file that is too large, it is `bufferSize` that is most likely too big. – 001 Jun 25 '19 at 19:04
  • 1
    @Regicide `ulimit -a` will tell you how much stack space a program can use. You can try increasing it too. – EOF Jun 25 '19 at 19:04
  • Setting the stack size is an implementation detail. It changes from compiler to compiler. Heck, C++ doesn't even require the system to have a stack. The language is specified such that if you can find a way to do it, you can use hamster wheels. – user4581301 Jun 25 '19 at 19:04
  • Assuming a gcc compiler because of Ububntu, so [Change stack size for a C++ application in Linux during compilation with GNU compiler](https://stackoverflow.com/questions/2275550/change-stack-size-for-a-c-application-in-linux-during-compilation-with-gnu-com) probably applies. Since you don't know how much stack you need when compiling, `malloc` is still the better option. – user4581301 Jun 25 '19 at 19:06
  • As a matter of fact, if you are going to allocate as much memory as the file size, you will end up in trouble sooner or later. You need to allocate fixed buffer size, and than reuse this buffer for multiple reads. – SergeyA Jun 25 '19 at 19:08
  • Im currently working to see if its a stack related problem. @user4581301 how would i solve this correctly with malloc? i tried a few things using it but i got the same errors all the time – Regicide Jun 25 '19 at 19:08
  • When you used `malloc` did you remember to `free` your buffer when you were done with it? – Christian Gibbons Jun 25 '19 at 19:10
  • Yes i did free the buffer. Also: Currently it is a stack-problem. When i give a extremely high max stack size, i tried 10GB, the programm works. Obviously thats not an ideal solution then. – Regicide Jun 25 '19 at 19:11
  • All you are doing is reading file X and writing the contents to file y, yes? You can read and write smaller chunks. There should be no reason you can't do this reading the file, say, 4kb at a time, writing the same 4 kb and then looping back around to get another 4k. You don't even need to know the file size with this approach, when you can't read any more data from the source file, you're done. – user4581301 Jun 25 '19 at 19:19
  • i am limited by the fact that the user is supposed to be able to enter the buffersize. Thats the reason it is a variable at all. So if i enter that the buffer should be 10MB big, it has to be 10MB big. So what i wouldwant to be able to do is create, with malloc, an array, thats as big as the variable "bufferSize" – Regicide Jun 25 '19 at 19:22

3 Answers3

0

Found the solution, after trying a few things. I replace

char buffer[bufferSize / sizeof(char)];

with

char* buffer = malloc(bufferSize);

Using malloc was already suggested. MY mistake was that i didnt replace

int geleseneBytes = read(sourceFD, buffer, sizeof(buffer));

Making those changes my program works for all file-sizes. Thanks for the help everybody!

int geleseneBytes = read(sourceFD, buffer, bufferSize);
Regicide
  • 73
  • 5
  • 1
    If you do that, make sure you use `free(buffer);` before the function returns. Otherwise, your program will have a memory leak every time the function is called. – R Sahu Jun 25 '19 at 20:14
0

You need to also change the line:

int geleseneBytes = read(sourceFD, buffer, sizeof(buffer));

To:
    int geleseneBytes = read(sourceFD, buffer, bufferSize);

Because the sizeof(buffer) is the size of a pointer, not the size of what it points to. Since you are malloc()ing it, you need to provide this.

ps:

char buffer[bufferSize/sizeof(char)];

is a bit cring inducing. Sizeof(char) is 1.

mevets
  • 10,070
  • 1
  • 21
  • 33
  • I already edited this in my answer, i just hit Enter too fast. Thanks nonetheless. Well, i guess we all did cringe-inducing errors at some point. The important thing is to not repeat them i guess. :) – Regicide Jun 25 '19 at 20:15
-1

If someone has a better solution, e.g. something with malloc, it would also be a big help.

Option 1:

Don't make bufferSize an input to the function. Use a hard-coded size that you know will not cause buffer overflow, such as std::numeric_limits<unsigned short>::max().

Option 2:

Use std::vector insteady of an array.

// char buffer[bufferSize / sizeof(char)];
std::vector<char> buffer(bufferSize);

Then use buffer.data() when you need to access the data of the object.

Update (since C++ tag was removed)

Don't make bufferSize an input to the function. Use a hard-coded size that you know willnot cause buffer overflow, such as USHORT_MAX.

void copyUsingBuffer(char from[], char to[])
{
    int bufferSize = USHORT_MAX
    char buffer[USHORT_MAX];

    struct stat statBuffer;
    int sourceFD, destFD;

    ...
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • 1
    Question is tagged [tag:c] – 001 Jun 25 '19 at 19:29
  • Option 1 is not possible, as it is a requirement for bufferSize to be a variable. Option 2 is not possible, as vectors arent in c. Thank you anyway :) – Regicide Jun 25 '19 at 19:30
  • Now it is tagged C. I found it the same way R Sahu did: It WAS tagged C++. If you're monitoring SO for new C++ questions, it shows up. If you didn't notice the tag was removed... Well, you probably give a C++ answer. – user4581301 Jun 25 '19 at 20:06