0

I'd like to parse a simple text file in a .c-program, where I want to react on all the line feeds in it. Unfortunately checking it with "is character == \n" does not work always.

I know there are different methods to code a line feed (e.g. 0x0A in ASCII code), so my question is: is there a safe way to check whether a character is LF or not?

unwind
  • 391,730
  • 64
  • 469
  • 606
user3085931
  • 1,757
  • 4
  • 29
  • 55

4 Answers4

4

Ok here is a list of newlines per operating system type:

Linux Systems: LF - LF (Line feed, '\n', 0x0A, 10 in decimal)

Unix Systems: LF - LF (Line feed, '\n', 0x0A, 10 in decimal)

Windows Systems: CR followed by LF (CR+LF, '\r\n', 0x0D0A)

Mac OS Systems: LF: Line Feed, U+000A

Android Systems: LF - LF (Line feed, '\n', 0x0A, 10 in decimal)

Unicode Systems: The Unicode standard defines a number of characters that conforming applications should recognize as line terminators:[3]

LF:    Line Feed, U+000A
VT:    Vertical Tab, U+000B
FF:    Form Feed, U+000C
CR:    Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL:   Next Line, U+0085
LS:    Line Separator, U+2028
PS:    Paragraph Separator, U+2029

Based on: http://en.wikipedia.org/wiki/Newline

user230910
  • 2,353
  • 2
  • 28
  • 50
  • I think it's about time to have a method that catches them all or upside down - prints according to the OS the right symbol. Thanks for your help I'd like to give both of you the right answer. – user3085931 Aug 28 '14 at 12:07
  • This list shows you cannot use one single method for all *possible* files. On reading an unknown file, you can easily distinguish single `\r`, `\n` and `\r\n`, but how would you know when a *double* LFs signifies a *single* line end? Even if you scan the entire file before deciding, you can't be sure. – Jongware Aug 28 '14 at 12:12
2

The end-of-line marker is operating system specific. On some OSes it is just \n, on others it may be \r or a mixture like \r\n etc. Probably the form-feed \f might sometimes be considered as an end-of-line.

On some systems, not passing the b mode flag to fopen(3) is altering the way it is read by the OS. On these systems, the file is then opened in binary mode with b and in text mode without it (and the text mode may mean to interpret end-of-line differently). You could also use getline(3) and handle the terminating characters as spaces (e.g. use isspace(3)...)

BTW, on Linux the dos2unix(1) command might be useful.

Also, your app might get a textual file produced on some other OS (without conversion). I would use getline (or the old fgets(3) if you don't care about very long lines) and handle all the spaces characters (tab, newline, formfeed, return, etc...) the same (like fscanf(3) or sscanf does).

I can't understand why the real end-of-line marker matters to you; why can't you use getline (or perhaps fgets) and handle every "end-of-line" character (be it \n, \r, \f or some mix of them) equally (in other words, as space tested with isspace). And this handles the case of a text file edited on Windows or MacOSX and passed to Linux or vice-versa.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • is there a standard-query that'll catch them all ? – user3085931 Aug 28 '14 at 11:45
  • well is there a list for all the possibilities to set a LF ? To at least hard code them in a query ? – user3085931 Aug 28 '14 at 11:53
  • first of all thanks for your help. The problem is I need to exchange the `\n` with a different character and if there are more than 1 characters for LF (like \r\n), I feel it's getting pretty ugly to catch them all and obtain the rest of the characters of the text unchanged. – user3085931 Aug 28 '14 at 12:02
2

try to use \r\n rather then \n . the ASCII code of \n = 10 and ASCII code of \r = 13. so In simple text file the line feed is the combination of \r\n (carriage return and new line).

Kailash Karki
  • 2,106
  • 1
  • 12
  • 6
0

I would recommend just opening as text file, and relying on the standard library's built-in conversions to handle this. Just read lines using fgets() and you should be fine.

unwind
  • 391,730
  • 64
  • 469
  • 606