6

This thinking comes from a discussion about a practical problem Replacing multiple new lines in a file with just one. Something wrong happened while using a cygwin terminal running on a windows 8.1 machine.

Since the end-of-line terminator would be different, like \n, \r, or \r\n, is it necessary to write a "portable" if(c=='\n') to make it work well on Linux, Windows and OS X? Or, the best practise is just to convert the file with commands/tools?

  #include <stdio.h>
    int main ()
    {
      FILE * pFile;
      int c;
      int n = 0;
      pFile=fopen ("myfile.txt","r");
      if (pFile==NULL) perror ("Error opening file");
      else
      {
        do {
          c = fgetc (pFile);
          if (c == '\n') n++; // will it work fine under different platform?
        } while (c != EOF);
        fclose (pFile);
        printf ("The file contains %d lines.\n",n);
      }
      return 0;
    }

Update1:

CRT will always convert line endings into '\n'?

Community
  • 1
  • 1
Eric Tsui
  • 1,924
  • 12
  • 21
  • 1
    CRT will always convert line endings into `'\n'`, so if you read the file as text it'll automatically be portable. Unless you want to read a file from any platform with any type of line endings – phuclv Jun 26 '15 at 09:23
  • @LưuVĩnhPhúc Thanks. And If trying to read a file (which created on windows) under Linux, the best practise is to rewrite `if(c=='\n')`, or just convert the file with commands/tools? – Eric Tsui Jun 26 '15 at 09:31
  • @LưuVĩnhPhúc wrong (in general) . Line-endings are only translated on microsoft-platforms, if the file/stream is opened in "ASCII-mode" (which only *exists* on microsoft platforms) – joop Jun 26 '15 at 09:39
  • @joop you're wrong. The C standard requires that "When writing a file in text mode, **'\n' is transparently translated to the native newline sequence used by the system**, which may be longer than one character. When reading in text mode, **the native newline sequence is translated back to '\n'**." https://en.wikipedia.org/wiki/Newline#In_programming_languages Moreover Windows is neither the first nor the only one to use CR-LF as newline character. Any systems that don't use `'\n'` as new line (such as classic Mac or ACORN) must convert their newline into `'\n'` anyway – phuclv Jun 26 '15 at 11:15
  • 2
    @EricTsui *portable* means that the same code when compiled on different platforms runs properly on that platform. It doesn't mean that you can work with files from other platforms without problem. So if you work with files from Windows you need to process them manually. `c=='\n'` won't work because the Linux CRT don't know Windows or any other systems' line endings and won't convert them. Converting the file first or checking in code depends on you – phuclv Jun 26 '15 at 11:22
  • 2
    As a user, I almost always appreciate it when a program that reads text files will treat ether `cr/lf` or a plain`lf` as a line ending. When writing text files, the default system line ending should be used (except when writing to the same file or when there are specific options to override that default provided and used). – Michael Burr Jun 26 '15 at 23:31

1 Answers1

4

If an input file is opened in binary mode (the character 'b' in the mode string) then it is necessary to worry about the possible presence of '\r' before '\n'.

If the file is not opened in binary mode (and also not read using binary functions such as fread()) then it is not necessary to worry about the presence of '\r' before '\n' because that will be handled before the input is received by your code - either by a relevant system function (e.g. device driver that reads input from disk, or from stdin) or by the implementation of the functions you use to read input from the file.

If you are transferring files between systems (e.g. writing the file under linux, and transferring it to a windows system, where a program tries to read it in) then you have options;

  • write and read the file in non-binary mode, and do a relevant translation of the file when transferring it between systems. If using ftp this can be handled by transferring the file using text mode rather than binary mode. If the file is transferred in binary mode, the you will need to run the file through dos2unix (if transferring the file to unix) or through unix2dos (going the other way).
  • Do all your I/O in binary mode, transfer them between systems using binary mode, and never read them in non-binary mode. Among other things, this gives you explicit control over what data is in the file.
  • Write your file in text mode, transfer the file as you see fit. Then only read in binary mode and, when your reading code encounters a \r\n pair, drop the '\r' character.

The last is arguably the most robust - the writing code might include \r before \n characters, or it might not, but the reading code simply ignores any '\r' characters that it encounters before a '\n' character. Such code will probably even cope if the files are edited by hand (e.g. with a text editor - that might be separately configured to either insert or remove \r and \n) before being read.

Peter
  • 35,646
  • 4
  • 32
  • 74