0

I was wondering if it is possible to ignore the new lines when reading a file. I've written a little program that reads the characters from a file and formats them but the new lines in the document mess up the formatting, I end up with double spaces where I only want a single spacing.

Is it possible to disable this feature? So that the only new lines my program prints out are the new lines that I insert into the print functions in my program?

Gaarsin
  • 23
  • 1
  • 7
  • 2
    Show your work so far and someone will be more likely to help you. – edtheprogrammerguy Dec 10 '16 at 21:23
  • 1
    Certainly not with the standard input libraries. But it's quite easy to go through each read block of characters and remove the newlines. Why don't you do that? – Gene Dec 10 '16 at 21:23
  • @Gene Isn't getc in the standard input library? – nicomp Dec 10 '16 at 21:32
  • Show your code that ends up with double spacing. Show a sample input file (3 lines should be enough) and the desired output. Depending on what you're after, you might read words with `scanf("%999s", buffer_1000)` which will blithely treat spaces, tabs and newlines as equivalent. Please read about creating an MCVE ([MCVE]). – Jonathan Leffler Dec 10 '16 at 21:32
  • 2
    When using `fgets` (the obvious choice) to read a file by line into `str`, you can remove the added `newline` with `str [ strcspn(str, "\r\n") ] = 0;` and that is harmless when there is no `newline` appended. – Weather Vane Dec 10 '16 at 21:33
  • @WeatherVane That unnecessarily scans the whole string. You can instead just check the last two characters. – Schwern Dec 10 '16 at 21:44
  • @Schwern if you mean the whole line, it only reads as much as the string buffer will hold. – Weather Vane Dec 10 '16 at 21:54
  • @WeatherVane I mean it's an O(n) operation when it could be an O(1). Wait... nevermind... `strlen` is O(n) in C. – Schwern Dec 10 '16 at 21:57
  • @nicomp Of course it is, but there's no way to tell the standard library to skip newlines. You read the newlines and then ignore them. – Gene Dec 10 '16 at 21:58
  • @Schwern thank you. Perhaps a sharper solution would read char by char. – Weather Vane Dec 10 '16 at 21:59
  • @WeatherVane That's what `strcspn` already does. I'd go with an optimized standard library function than something home rolled. – Schwern Dec 10 '16 at 22:30
  • @Schwern `strcspn` does not "read char by char", it scans a string already read. As you previously said, each string has to be scanned to use either `strlen` or `strcspn`. You are now contradicting yourself by saying "No don't read by char, scanning is more efficient". – Weather Vane Dec 10 '16 at 22:45
  • @WeatherVane Oh, you meant reading the file character by character with `fgetc`. I thought you meant reading the string character by character with a while loop. Since file reads are block buffered, calling `fgetc` and looking for a newline might be a touch more efficient than `fgets` + `strlen` or `strcspn`. I might benchmark it. – Schwern Dec 10 '16 at 23:01
  • @Schwern we are talking the same way at last! – Weather Vane Dec 10 '16 at 23:29
  • @WeatherVane I coded them up and found fgets + strlen is fastest with fgets + strcpsn a very close second and fgetc trailing badly. Although I don't think my fgetc implementation is very good. I used /usr/share/dict/words and the text of the SQL 1992 standard for testing with output to /dev/null. [Here's the code](https://gist.github.com/schwern/ae3250fdd2b21277c9987c65a49d13e7). We can talk about it [in chat](http://chat.stackoverflow.com/rooms/54304/c). – Schwern Dec 11 '16 at 00:25
  • @Schwern that was an interesting comparison. Sorry I went offline in my time zone. – Weather Vane Dec 11 '16 at 20:11

1 Answers1

4

C doesn't provide much in the way of conveniences, you have to provide them all yourself or use a 3rd party library such as GLib. If you're new to C, get used to it. You're working very close to the bare metal silicon.

Generally you read a file line by line with fgets(), or my preference POSIX getline(), and strip the final newline off yourself by looking at the last index and replacing it with a null if it's a newline.

#include <string.h>
#include <stdio.h>

char *line = NULL;
size_t line_capacity = 0; /* getline() will allocate line memory */

while( getline( &line, &line_capacity, fp ) > 0 ) {
    size_t last_idx = strlen(line) - 1;

    if( line[last_idx] == '\n' ) {
        line[last_idx] = '\0';
    }

    /* No double newline */
    puts(line);
}

You can put this into a little function for convenience. In many languages it's referred to as chomp.

#include <stdbool.h>
#include <string.h>

bool chomp( char *str ) {
    size_t len = strlen(str);

    /* Empty string */
    if( len == 0 ) {
        return false;
    }

    size_t last_idx = len - 1;
    if( str[last_idx] == '\n' ) {
        srt[last_idx] = '\0';
        return true;
    }
    else {
        return false;
    }
}

It will be educational for you to implement fgets and getline yourself to understand how reading lines from a file actually works.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Hello, thanks for the detailed response. I will award you best answer, and close the thread. I found a way to do what I wanted. I used the following if anyone was wondering: if (characters[i-1] != '\n') {printf("\n");} – Gaarsin Dec 10 '16 at 21:44
  • I will take your advice, and I thank you for it. I only started writing code a couple of weeks back and I've only used C so far so I'm not very experienced. – Gaarsin Dec 10 '16 at 21:47
  • 1
    @Gaarsin If you may take another piece of advice: start with a more modern language, particularly one that does memory management for you and has better string handling. Learning programming is hard enough: variables, loops, functions, indentation, objects, algorithmic thinking... without also having to worry about memory management every time you just want to concatenate two strings. – Schwern Dec 10 '16 at 21:56
  • 1
    @Schwern Corner case: Consider what happens to `chomp()` if a line begins with a null character with `size_t last_idx = strlen(line) - 1; if( line[last_idx] == '\n' )`. It leads to certain UB. `line[strcspn(line, "\n") = 0;` is a nice alternative. – chux - Reinstate Monica Dec 11 '16 at 01:30
  • 1
    @chux Good point, I'll throw in a check and return. I'm not a fan of the `strcspn` version because it will eat the first newline, whereas `chomp` only eats a newline at the end. – Schwern Dec 11 '16 at 03:32
  • 1
    @Schwern Note that a string returned from `fgets()` can only have, at most, 1 `'\n'`. Still good and defensive improvement to `chomp()`. [ref](http://stackoverflow.com/a/27729970/2410359) – chux - Reinstate Monica Dec 11 '16 at 03:57
  • @chux Right, it's intended as a general purpose string function. – Schwern Dec 11 '16 at 04:28