Squeeze blank lines into one blank line in C

Question

Hello to refer to the same question but different code.

Replacing multiple new lines in a file with just one

int main(void){

    format();
    printf("\n");
    return 0;
}

void format(){
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while (( c= getchar()) != EOF ){

        /*TABS*/
        if(c == '\t'){
            c = ' ';
        }
        /*SPACES*/
        if (c ==' '){
            if(nspace > 0){
                continue;
            }
            else{
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }

        /*NEW LINE*/
        else if(c == '\n'){
            if(++nlines >2){
                continue;
            }
            else {
                nlines++;
                nspace = 0;
            }
            putchar(c);
        }   
        else{
            putchar(c);
            nspace = 0;
            nlines = 0;
        }       
    }
}

I want to squeeze multiple blank lines into one blank line but it doesn't seem to work and on Cygwin terminal at the stdout, last line gives me extra blank line although the input doesn't have the blank line at the end.

For example
INPUT

Hello   Hi\n
\n
\n
Hey\t\tHola\n

DESIRED OUTPUT

Hello Hi\n
\n
Hey Hola\n

ACTUAL OUTPUT

Hello Hi\n
Hey Hola\n

Please explain!

output you provided is output you want or output you are getting. — Shashwat Kumar, Jun 09 '17 at 04:40
`isspace()` can make your code more compact and simple for reading — VolAnd, Jun 09 '17 at 05:04
Do **not** modify your code after you've gotten answers in such a way that you invalidate the answers. That is not kosher at all. — Jonathan Leffler, Jun 09 '17 at 05:13
my instructor doesn't like it. I used nested while loop which worked but I had to redo because he doesn't like it. Thank you anyway! @VolAnd — dyingStudent, Jun 09 '17 at 05:17
oh I am so sorry! I thought it would be great for people. I am so sorry. @JonathanLeffler — dyingStudent, Jun 09 '17 at 05:18
the posted code does not compile! amongst other things, it is missing the needed `#include` statements and prototypes for the sub functions (like `void format( void );` — user3629249, Jun 10 '17 at 02:50
@user3629249 obviously we all know we need that so I didn't post it from the beginning. :) That's also not the point. — dyingStudent, Jun 11 '17 at 20:28
When asking a question about a run time problem (as your question is), there are certain requirements that must be met. Otherwise the question is considered 'off topic' and will be voted to be closed. The requirements are: 1) post a cleanly compiles, short code that still has the problem. 2) post the actual inputs 3) post the expected outputs 4) post the actual outputs. Your question is 'off topic' because the posted code does not cleanly compile. As far as 'well all know we need that' Are you expecting us to guess as to what header files you actually included? — user3629249, Jun 11 '17 at 23:22

score 1 · Answer 1 · answered Jun 09 '17 at 04:50

1

You're incrementing nlines twice:

else if(c == '\n'){
    if(++nlines >2){  /* incremented here */
        continue;
    }
    else {
        nlines++;     /* incremented here */
        nspace = 0;
    }
    putchar(c);
}

You just want to do it once. I'd suggest just incrementing the counter until it hits 2 and then not incrementing it any more. That just means a small change:

    if(nlines >= 2){
        continue;
    }

answered Jun 09 '17 at 04:50

paddy

60,864
6
61
103

Hello I modifed my working code except on Cygwin terminal, it gives me extran ending lines. – dyingStudent Jun 09 '17 at 05:05
@dyingStudent: Your code unconditionally adds a newline at the end of the output — which is evidently not what you want. However, your code also doesn't recognize whether the file ended without a newline. You can see how I resolved it in my answer. – Jonathan Leffler Jun 09 '17 at 05:10

Jonathan Leffler · Accepted Answer · 2017-06-09T05:57:32.540

Here's a variant of your code. I eliminated the format() function (which is unusual for me since most programs on SO don't use enough functions) incorporating it directly into main(). The code treats spaces and newlines more symmetrically now, fixing the double increment problem also identified in paddy's answer. It also only prints out a newline at the end if there wasn't already a newline at the end. That normalizes files which do not end with a newline. The initialization of nlines = 1; deals with multiple newlines at the start of the file — that was well done already.

#include <stdio.h>

int main(void)
{
    int c;
    size_t nlines = 1;
    size_t nspace = 0;

    while ((c = getchar()) != EOF)
    {
        if (c == '\t')
            c = ' ';
        if (c == ' ')
        {
            if (nspace < 1)
            {
                putchar(c);
                nspace++;
                nlines = 0;
            }
        }
        else if (c == '\n')
        {
            if (nlines < 2)
            {
                putchar(c);
                nlines++;
                nspace = 0;
            }
        }
        else
        {
            putchar(c);
            nspace = 0;
            nlines = 0;
        }
    }
    if (nlines == 0)
        putchar('\n');
    return 0;
}

My testing uses some Bash-specific notations. My program was sb73: The last of test input does not include a final newline. The outputs use ⌴ to indicate a newline in the output:

$ echo $'Hello   Hi\n\n\nHey\t\tHola\n' | sb73
Hello Hi⌴
⌴
Hey Hola
⌴
$

and:

$ echo $'\n\nHello   Hi\n\n\n    Hey\t\tHola\n' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
⌴
$

and:

$ printf '%s' $'\n\nHello   Hi\n\n\n    Hey\t\tHola' | sb73
⌴
Hello Hi⌴
⌴
 Hey Hola⌴
$

Handling CRLF line endings

The comments identify that the code above doesn't work on a Cygwin terminal, and the plausible reason is that the data being modified has CRLF line endings. There are various ways around this. One is to find a way of forcing the standard input into text mode. In text mode, CRLF line endings should be mapped to Unix-style '\n' (NL or LF only) endings on input, and Unix-style line ending should be mapped to CRLF line endings on output.

Alternatively, it would be possible simply to ignore CR characters:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb47.c  2017-06-08 22:40:24.000000000 -0700
@@ -19,6 +19,8 @@
                 nlines = 0;
             }
         }
+        else if (c == '\r')
+            continue;    // Windows?
         else if (c == '\n')
         {
             if (nlines < 2)

That's a 'unified diff' showing two extra lines in the code. Or it is possible to handle CR not followed by LF as a regular character and yet handle CR followed by LF as a newline combination:

--- sb73.c  2017-06-08 22:04:28.000000000 -0700
+++ sb59.c  2017-06-08 22:42:43.000000000 -0700
@@ -19,6 +19,17 @@
                 nlines = 0;
             }
         }
+        else if (c == '\r')
+        {
+            if ((c = getchar()) == '\n')
+            {
+               ungetc(c, stdin);
+               continue;
+            }
+            putchar('\r');
+            nspace = 0;
+            nlines = 0;
+        }
         else if (c == '\n')
         {
             if (nlines < 2)

There's probably a way to write a state machine that handles CR, but that would be more complex.

I have a utod program that converts Unix-style line endings to Windows-style; I used that in the pipeline to test the new variants of the code.

thank you so much! now on my Cygwin terminal works properly! it doesn't give me extra blank line — dyingStudent, Jun 09 '17 at 05:16
It is funny how it doesn't work on my Cygwin terminal. It doesn't squeeze multiple blank lines. — dyingStudent, Jun 09 '17 at 05:36
On Windows, if the standard input is not 'text' mode, you will get a `'\r'` and a `'\n'` for each line ending, and the code shown treats `'\r'` as an 'other' character. It can be modified to handle CRLF. One simple way is by ignoring `'\r'` altogether. Another possibility is that there's a WIndows-specific way to put standard input into text mode. Or a more radical modification can be made. — Jonathan Leffler, Jun 09 '17 at 05:37
@JonathanLeffler there's [`setmode()`](https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx), but `stdin` should be in text mode by default... — , Jun 09 '17 at 09:07
@FelixPalmen: thanks for the function name — it isn't one I've needed to memorize. I'm a little surprised that standard input was not in text mode, but maybe it is something to do with Cygwin. OTOH, it seems to be a plausible explanation for what is going wrong because the code shown should work fine if the CRLF to newline mapping was occurring. It could be something else, but I'd be quite surprised. — Jonathan Leffler, Jun 09 '17 at 14:10

Squeeze blank lines into one blank line in C

2 Answers2

Handling CRLF line endings