0

I am after a simple task: reading one line at a time from a file, printing the line and appending all the content in a char array. It all started with a Segmentation fault (core dumped) from my project, I then kept on isolating my code until I reached this:

#include <stdio.h>
#include <string.h>
int main(void)
{
    FILE *fp;
    fp = fopen("read.txt","r");
    char buffer[255];
    char longBuff[1024] = "";
    while(fgets(buffer, 255, fp)) {
        printf("%s\n",buffer);
        strcat(longBuff, buffer);
    }
    fclose(fp);
    printf("WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWTF%s\n", longBuff);
}

The read.txt file:

short
this is Longer
+++++
sad

And the Output:

sad++is Longer
sad++is LongerWWWWWWWWWWWWWWWWWWWTFshort

When I was confidently expecting:

short
this is Longer
+++++
sad
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWTFshortthis is Longer+++++sad

I have been over multiple similar questions and most answers refer to carriage return but I still don't understand this behavior and what is the cause for it.

Cezar Cobuz
  • 1,077
  • 1
  • 12
  • 34
  • Is it possibe for you to use ifstream and string? If so, you could very conveniently use std::get_line() for this task. –  Jan 08 '19 at 02:51
  • yes although I would have preferred C over C++ , will the output be different? I am trying it right now – Cezar Cobuz Jan 08 '19 at 02:58
  • 1
    I suspect you are in Unix/Linux but opening a text file with Windows line endings. Try hex-dumping the text file and look for \r\n (0xD 0xA). Outputting the carriage return resets the cursor to the start of the line. To fix this you can either have your program remove the carriage returns, or fix the text file to not contain them – M.M Jan 08 '19 at 03:09
  • I am on Linux, what should be the file extension for nomal text, I tried leaving the file with no extension and I'm back to `Segmentation fault (core dumped)` I never though a simple task could cause me so much trouble. – Cezar Cobuz Jan 08 '19 at 03:14
  • 1
    Add `buffer[strcspn (buffer, "\r\n")] = 0;` immediately below `while (fgets(...))` – David C. Rankin Jan 08 '19 at 03:14
  • @David C. Rankin Output is now :short WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWTFshort – Cezar Cobuz Jan 08 '19 at 03:17
  • Note that fgets adds a \n at the end of the string the buffer, before the \0, so you don't need a \n in the printf. EDIT: Additionally, you may just want to convert the \r\n's to \n anyway, if it's possible for you. – Anonymous1847 Jan 08 '19 at 03:17
  • 2
    Funny, I ran your code on Linux with `buffer[strcspn (buffer, "\r\n")] = 0;` and with your input file with CRLF endings and I get your expected output. Have you hex-dumped your file yet? Are you sure it isn't UTF-16? – David C. Rankin Jan 08 '19 at 03:20
  • I created a new file on Linux with no extension, and with the added `buffer[strcspn (buffer, "\r\n")] = 0;` it worked for me too, but it is still magic, thank you anyhow! So it had to do with the way I created that initial file... – Cezar Cobuz Jan 08 '19 at 03:24
  • 1
    This is a guess, but did you create your original file in VS-Code, or download the file from the internet? Check it with `hexdump -Cv read.txt`. That way you can look at the bytes in the file. You can also use the Linux `file` utility to determine what the file is, e.g. `file read.txt`. Your file may also have a BOM (byte order mark) that is screwing things up. You won't know until you look. Also, to eliminate some mystery, all `buffer[strcspn (buffer, "\r\n")] = 0;` does is trim the line-endings from `buffer`. (`strcspn` returns the initial number of character in `buffer` NOT `"\r\n"`) – David C. Rankin Jan 08 '19 at 03:26
  • file was created with gedit, either from gedit new file, or terminal `gedit read.txt` (can't really recall) then I added a couple of line and saved it. Output of `file read`: `read: ASCII text, with CR line terminators`. The new file (where code runs as expected with and without `strcspn()`) is just `ASCII text` – Cezar Cobuz Jan 08 '19 at 03:30
  • OK, so you see where it says `"with CR line terminators"` - that is flat nuts -- that (the `CR` line ending) is mac (pre OSX) line endings). However you told `gedit` to save that file, don't do it again `:)`. I suspect the file you added to originally came from a very old Macintosh computer... – David C. Rankin Jan 08 '19 at 03:32
  • Amazing, since I created everything on my local Ubuntu from scratch – Cezar Cobuz Jan 08 '19 at 03:34
  • 1
    There are options in all modern editors that will allowing saving with `LF` (line-feed Unix line endings), `CRLF` (carriage-return line-feed DOS line endings) and yes `CR` (carriage-return Mac Pre-OSX line endings). Somehow you told `gedit` to do the latter (it should default to `LF` line endings). When you say, "created from scratch", if you just opened `gedit` and typed your file, saved and still ended up with `CR` line-endings, you need to check your settings so it doesn't happen again. Good luck with your coding. – David C. Rankin Jan 08 '19 at 03:38

2 Answers2

2

The text file likely originated on a platform with "\r\n" line endings @M.M.

A simple solution takes advantage the should "\r" occur, it is overwhelming part of the line ending and can easily be lopped off. strcspn()

I now see @David C. Rankin suggested this.

while(fgets(buffer, sizeof buffer, fp)) {
  // Find length of string not made up of '\n', '\r', '\0'
  // This nicely lops off the line ending, be it "\n", "\r\n" or missing.
  buffer[strcspn(buffer, "\n\r")] = '\0';
  printf("<%s>\n",buffer);
}

Unfortunately when a text file line-endings employ "\r" only, fgets() (on a system expecting "\n") will not see any line ending. There a new approach is needed.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

As I am using Linux, the problem was with the input file. After running file read I got read: ASCII text, with CR line terminators and the CR LT were causing that stage overwriting behavior. Created a new input file newFile: ASCII text with the same text and the output was as expected.

Cezar Cobuz
  • 1,077
  • 1
  • 12
  • 34