0

I'm reading stdin and there are sometimes unix-style and sometimes windows-style newlines.

How to consume either type of newline?

MightyPork
  • 18,270
  • 10
  • 79
  • 133
  • 1
    Assuming you do not read `char` by `char`, the most elegant way I know is decribed here: http://stackoverflow.com/a/28462221/694576 Probably a duplicate? – alk May 09 '15 at 14:29

2 Answers2

2

Assuming you know there will be a newline, the solution is to consume one character, and then decide:

10 - LF ... Unix style newline
13 - CR ... Windows style newline

If it's 13, you have to consume one more character (10)

const char x = fgetc(stdin); // Consume LF or CR
if (x == 13) fgetc(stdin); // consume LF
MightyPork
  • 18,270
  • 10
  • 79
  • 133
1

There are a few more newline conventions than that. In particular, all four involving CR \r and LF \n -- \n, \r, \r\n, and \n\r -- are actually encoutered in the wild.

For reading text input, possibly interactively, and supporting all of those four newline encodings at the same time, I recommend using a helper function something like the following:

#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <errno.h>

size_t get_line(char **const lineptr, size_t *const sizeptr, char *const lastptr, FILE *const in)
{
    char  *line;
    size_t size, have;
    int    c;

    if (!lineptr || !sizeptr || !in) {
        errno = EINVAL; /* Invalid parameters! */
        return 0;
    }

    if (*lineptr) {
        line = *lineptr;
        size = *sizeptr;
    } else {
        line = NULL;
        size = 0;
    }

    have = 0;

    if (lastptr) {
        if (*lastptr == '\n') {
            c = getc(in);
            if (c != '\r' && c != EOF)
                ungetc(c, in);
        } else
        if (*lastptr == '\r') {
            c = getc(in);
            if (c != '\n' && c != EOF)
                ungetc(c, in);
        }
        *lastptr = '\0';
    }

    while (1) {

        if (have + 2 >= size) {

            /* Reallocation policy; my personal quirk here.
             * You can replace this with e.g. have + 128,
             * or (have + 2)*3/2 or whatever you prefer. */ 
            size = (have | 127) + 129;

            line = realloc(line, size);
            if (!line) {
                errno = ENOMEM; /* Out of memory */
                return 0;
            }

            *lineptr = line;
            *sizeptr = size;
        }

        c = getc(in);
        if (c == EOF) {
            if (lastptr)
                *lastptr = '\0';
            break;
        } else
        if (c == '\n') {
            if (lastptr)
                *lastptr = c;
            else {
                c = getc(in);
                if (c != EOF && c != '\r')
                    ungetc(c, in);
            }
            break;
        } else
        if (c == '\r') {
            if (lastptr)
                *lastptr = c;
            else {
                c = getc(in);
                if (c != EOF && c != '\n')
                    ungetc(c, in);
            }
            break;
        }            

        if (iscntrl(c) && !isspace(c))
            continue;

        line[have++] = c;
    }

    if (ferror(in)) {
        errno = EIO; /* I/O error */
        return 0;
    }

    line[have] = '\0';
    errno = 0; /* No errors, even if have were 0 */
    return have;
}

int main(void)
{
    char   *data = NULL;
    size_t  size = 0;
    size_t  len;
    char    last = '\0';

    setlocale(LC_ALL, "");

    while (1) {
        len = get_line(&data, &size, &last, stdin);
        if (errno) {
            fprintf(stderr, "Error reading standard input: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        if (!len && feof(stdin))
            break;

        printf("Read %lu characters: '%s'\n", (unsigned long)len, data);
    }

    free(data);
    data = NULL;
    size = 0;

    return EXIT_SUCCESS;
}

Except for the errno constants I used (EINVAL, ENOMEM, and EIO), the above code is C89, and should be portable.

The get_line() function dynamically reallocates the line buffer to be long enough when necessary. For interactive inputs, you must accept a newline at the first newline-ish character you encounter (as trying to read the second character would block, if the first character happens to be the only newline character). If specified, the one-character state at lastptr is used to detect and handle correctly any two-character newlines at the start of the next line read. If not specified, the function will attempt to consume the entire newline as part of the current line (which is okay for non-interactive inputs, especially files).

The newline is not stored or counted in the line length. For added ease of use, the function also skips non-whitespace control characters. Especially embedded nul characters (\0) often cause headaches, so having the function skip those altogether is often a robust approach.

As a final touch, the function always sets errno -- to zero if no error occurred, nonzero error code otherwise --, including ferror() cases, so detecting error conditions is trivial.

The above code snippet includes a main(), which reads and displays input lines, using the current locale for the meaning of "non-whitespace control character" (!isspace(c) && iscntrl(c)).

Although this is definitely not the fastest mechanism to read input, it is not that slow, and it is a very robust one.

Questions?

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86