1

I have to write a C program that somewhat function like dos2unix. Which replace all CR LF with only LF (DOS-Format to Unix-Format). So this is my way of approach. Everytime I read a line, I search for the end of the data by lookng for \0and then check if the following are \r\n. If yes replace with \nonly. But it seems do not work and the line CRLF here never been printed out once.

char data[255]; // save the data from in.txt
char *checker;
pf = fopen("in.txt", "r");
pf2 = fopen("out.txt", "w");
while (feof(pf) == 0)
{
    fgets(data, 255, pf);       // Read input data
    checker = data;
    while (checker != "\0") // Search for a new line
    {
        if (checker == "\r\n") // Check if this is CR LF
        {
            printf("CRLF here");
            checker = "\n";   // replace with LF
        }
        checker++;
    }
    fputs(data, pf2);       // Write to output data
}
Someonewhohaveacat
  • 83
  • 1
  • 3
  • 11
  • 3
    You need to use `strcmp` (or `strncmp`) to compare strings. `==` compares pointer values. – zerkms Dec 11 '17 at 20:46
  • 3
    Please read [this question](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong). – unwind Dec 11 '17 at 20:46
  • Note that the CRLF characters would come before the null byte, not after it. You would also run into problems with long lines — longer than 254 characters. There are ways to write the code so that is simply not an issue. On your current scheme, you need to search for … well, maybe use `strlen()`, or maybe use `strchr()`, or `strcspn()`, or … there are many ways to do it, but the shown code isn't one of them. – Jonathan Leffler Dec 11 '17 at 20:55
  • 1
    It's not about comparing strings, its about opening a file in text mode; It is not a duplicate of the question/answer marked. – Stephan Lechner Dec 11 '17 at 20:56
  • 1
    In addition to all the bugs other people have pointed out, you may need to open both files in binary mode (`"rb"`, `"wb"`) to even see CRLFs in the first place, and to prevent the C library from converting `\n` back to `\r\n` on output; you aren't checking for any errors; your logic for _replacing_ `\r\n` with `\n` is incorrect; and you don't handle long lines correctly; – zwol Dec 11 '17 at 20:56
  • 2
    If you are converting line feeds, there is no good reason to use `fgets`. That function is line oriented, and if you're worried about the line endings, then pretty much by definition you don't trust your input. And `fgets` is overkill. Read the the file one character at a time with `fgetc` or `getc`. You only need to keep track of a small amount of state. (There are 2 states: one in which the previous character read was `\r`, one in which it wasn't) – William Pursell Dec 11 '17 at 21:13
  • Thanks to everyone, who reply to my question. I am really appreciated. – Someonewhohaveacat Dec 11 '17 at 21:26

1 Answers1

5

You have a whole bunch of bugs:

  • You may need to open in.txt in "rb" mode, instead of "r" mode, to see the CRLF line endings in the first place.
  • You may need to open out.txt in "wb" mode, instead of "w" mode, to prevent the C library from undoing your work.
  • You cannot compare string literals with ==. You can compare one character of a string to a character literal with ==, but that's not what you're doing, and it only works for single characters; a CRLF sequence is two characters.
  • You cannot replace a two-character sequence with a one-character sequence within a mutable C string by simple assignment. You would need to use memmove to shift all the characters after the replacement down one.
  • You do not properly handle very long lines.
  • You do not check whether fopen succeeded, or for any other I/O errors.
  • while (!feof (fp)) is always wrong.

A better way to write this program is with a main loop that goes character by character, something like

  int c;
  while ((c = getc(ifp)) != EOF) {
    if (c == '\r') {
      putc('\n', ofp);
      c = getc(ifp);
      if (c == EOF) break;
      if (c == '\n') continue;
    }
    putc(c, ofp);
  }

This converts both \r\n and bare \r to \n, because bare \r is very rare nowadays, but was used as the line terminator on some historical OSes (notably classic MacOS), and there isn't anything else sensible to do with it.

It's important that c be an int, not a char, so that it can hold EOF as well as all possible characters.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Note; Re: "You do not check for I/O errors." This code does check for input errors. It does not check for output errors (which are quite rare). – chux - Reinstate Monica Dec 12 '17 at 00:09
  • @chux Not in my experience -- but mostly what I meant was they're not checking the result of `fopen`. – zwol Dec 12 '17 at 01:27