-2

I made a simple script to rewrite one file contents into another. Here's code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char filename[1024];
    scanf("%s", &filename);

    // printf("Filename: '%s'\n", filename);

    int bytesToModify; scanf("%d", &bytesToModify);

    FILE *fp;

    fp = fopen(filename, "r");
    fseek(fp, 0, SEEK_END);
    int fSize = ftell(fp);
    fseek(fp, 0, SEEK_SET);

    printf("%d\n", fSize);

    char *buf = malloc(fSize*sizeof(char));

    for (int i = 0; i < fSize; i++) {
        buf[i] = getc(fp);
    }
    fclose(fp);

    FILE *fo;

    fo = fopen("out_file.txt", "w");
    for (int i = 0; i < fSize; i++) {
        fwrite(&buf[i], 1, 1, fo);
    }
    fclose(fo);

    return 0;
}

Even on small file like this I can see the artifact. Cyrillic sybmol 'я' is coming in the end of file. If I'll try to rewrite executable file, i get this: enter image description here

99% of file just turned to these symbols. What is wrong with my code?

I'm using CodeBlocks with GCC Compiler, version 10.1.0. My Operation System is Windows 10.

Thanks for your help.

Зелди
  • 76
  • 1
  • 12
  • 3
    `fp = fopen(filename, "r");` ==> `fp = fopen(filename, "rb");` and similarly for `"out_file.txt"` – pmg Sep 30 '20 at 12:55
  • the returned value of `ftell` must be `long`, not `int`. the same for the iterator `i`. – alinsoar Sep 30 '20 at 12:56
  • Whenever doing any sort of IO you need to be careful that you watch the encoding which is set via the [locale](https://en.cppreference.com/w/c/locale/setlocale). Valid [locales on windows](https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=vs-2019) are a bit funky but still work. – Mgetz Sep 30 '20 at 13:10

1 Answers1

3
  1. You did not open the file in binary mode: "rb" and "wb". Therefore, fgetc will turn all \r\n to a single \n.

  2. For each line terminator there is one character less read. Yet you attempt to read nevertheless, and fgetc will return EOF (and fgetc returns an int, not char). As EOF has value -1 on Windows, when written to file converted to unsigned char this results in Я in the encoding you're using in Notepad (most likely Windows-1251).

Furthermore, since you're using fwrite, then you could similarly use fread. And no need to read, write the characters one at a time, just use

char *buf = malloc(fSize);
int bytesRead = fread(buf, 1, fSize, fp); 
fclose(fp);

and

int bytesWritten = fwrite(buf, 1, bytesRead, fo);