1

I have a C code which reads 1 line at a time, from a file opened in text mode using

fgets(buf,200,fin); 

The input file which fgets() reads lines from, is an command line argument to the program.

Now fgets leaves the newline character included in the string copied to buf.

Somewhere do the line in the code I check

length = strlen(buf);

For some input files , which I guess are edited in *nix environment newline character is just '\n'

But for some other test case input files(which I guess are edited/created under Windows environment) have 2 characters indicating a newline - '\r''\n'

I want to remove the newline character and want to put a '\0' as the string terminator character. So I have to either do -

    if(len == (N+1))
    {
    if(buf[length-1] == '\n')
     {
         buf[length-2] = '\0'; //for a `\r\n` newline
     }
    } 

or

if(len == (N))
{
 if(buf[length-1] == '\n')
 {
     buf[length-1] = '\0'; //for a `\n` newline
 }
} 

Since the text files are passed as commandline argument to the program I have no control of how it is edited/composed and hence cannot filter it using some tool to make newlines consistent.

How can I handle this situation?

Is there any fgets equivalent function in standard C library(no extensions) which can handle these inconsistent newline characters and return a string without them?

goldenmean
  • 18,376
  • 54
  • 154
  • 211
  • The meaning of N in your code is not clear. Are you sure you don't mean 'if (len > 1)' and 'if (len > 0)'. Not to mention that you used 'length' in the rest of the code and not 'len'. – Remo.D Jul 14 '11 at 15:29
  • 1
    Possible duplicate of [Removing trailing newline character from fgets() input](http://stackoverflow.com/questions/2693776/removing-trailing-newline-character-from-fgets-input) – Philippe A. Nov 19 '15 at 22:17

3 Answers3

2

I like to update length at the same time

if (buf[length - 1] == '\n') buf[--length] = 0;
if (buf[length - 1] == '\r') buf[--length] = 0;

or, to remove all trailing whitespace

/* remember to #include <ctype.h> */
while ((length > 0) && isspace((unsigned char)buf[length - 1])) {
    buf[--length] = 0;
}
pmg
  • 106,608
  • 13
  • 126
  • 198
  • Hmmm, I forgot the check `length > 0` in the first code block – pmg Jul 14 '11 at 10:02
  • About removing white space, what if the buf[] above has leading whitespace/s? – goldenmean Jul 15 '11 at 02:21
  • @goldenmean: the snippet above doesn't care about leading whitespace. The logic is different: you can either point to the first non-whitespace or move characters back. – pmg Jul 15 '11 at 06:42
1

I think your best (and easiest) option is to write your own strlen function:

size_t zstrlen(char *line)
{
  char *s = line;

  while (*s && *s != '\r' && s != '\n) s++;
  *s = '\0';
  return (s - line);
}

Now, to calculate the length of the string excluding the newline character(s) and eliminating it(/them) you simply do:

fgets(buf,200,fin);
length = zstrlen(buf);

It works for Unix style ('\n'), Windows style ('\r\n') and old Mac style ('\r').

Note that there are faster (but non-portable) implementation of strlen that you can adapt to your needs.

Hope it helps, RD:

Remo.D
  • 16,122
  • 6
  • 43
  • 74
0

If you are troubled by the different line endings (\n and \r\n) on different machines, one way to neutralize them would be to use the dos2unix command (assuming you are working on linux and have files edited in a Windows environment). That command would replace all window-style line endings with linux-style line endings. The reverse unix2dos also exists. You can call these utilities from within the C program (system maybe) and then process the line like you are currently doing. This would reduce the burden on your program.

Sriram
  • 10,298
  • 21
  • 83
  • 136
  • Pls see the OP once again. I have clearly mentioned that would not want to use any tool to filter to make the newlines consitent. – goldenmean Jul 14 '11 at 10:09
  • You "would not want" (from comment above) to use any tool or you "cannot" (from question)? There is quite a difference between the two, IMO.. – Sriram Jul 14 '11 at 10:19