Here's a variant of your code. I eliminated the format()
function (which is unusual for me since most programs on SO don't use enough functions) incorporating it directly into main()
. The code treats spaces and newlines more symmetrically now, fixing the double increment problem also identified in paddy's answer. It also only prints out a newline at the end if there wasn't already a newline at the end. That normalizes files which do not end with a newline. The initialization of nlines = 1;
deals with multiple newlines at the start of the file — that was well done already.
#include <stdio.h>
int main(void)
{
int c;
size_t nlines = 1;
size_t nspace = 0;
while ((c = getchar()) != EOF)
{
if (c == '\t')
c = ' ';
if (c == ' ')
{
if (nspace < 1)
{
putchar(c);
nspace++;
nlines = 0;
}
}
else if (c == '\n')
{
if (nlines < 2)
{
putchar(c);
nlines++;
nspace = 0;
}
}
else
{
putchar(c);
nspace = 0;
nlines = 0;
}
}
if (nlines == 0)
putchar('\n');
return 0;
}
My testing uses some Bash-specific notations. My program was sb73
:
The last of test input does not include a final newline. The outputs use ⌴ to indicate a newline in the output:
$ echo $'Hello Hi\n\n\nHey\t\tHola\n' | sb73
Hello Hi⌴
⌴
Hey Hola
⌴
$
and:
$ echo $'\n\nHello Hi\n\n\n Hey\t\tHola\n' | sb73
⌴
Hello Hi⌴
⌴
Hey Hola⌴
⌴
$
and:
$ printf '%s' $'\n\nHello Hi\n\n\n Hey\t\tHola' | sb73
⌴
Hello Hi⌴
⌴
Hey Hola⌴
$
Handling CRLF line endings
The comments identify that the code above doesn't work on a Cygwin terminal, and the plausible reason is that the data being modified has CRLF line endings. There are various ways around this. One is to find a way of forcing the standard input into text mode. In text mode, CRLF line endings should be mapped to Unix-style '\n'
(NL or LF only) endings on input, and Unix-style line ending should be mapped to CRLF line endings on output.
Alternatively, it would be possible simply to ignore CR characters:
--- sb73.c 2017-06-08 22:04:28.000000000 -0700
+++ sb47.c 2017-06-08 22:40:24.000000000 -0700
@@ -19,6 +19,8 @@
nlines = 0;
}
}
+ else if (c == '\r')
+ continue; // Windows?
else if (c == '\n')
{
if (nlines < 2)
That's a 'unified diff' showing two extra lines in the code. Or it is possible to handle CR not followed by LF as a regular character and yet handle CR followed by LF as a newline combination:
--- sb73.c 2017-06-08 22:04:28.000000000 -0700
+++ sb59.c 2017-06-08 22:42:43.000000000 -0700
@@ -19,6 +19,17 @@
nlines = 0;
}
}
+ else if (c == '\r')
+ {
+ if ((c = getchar()) == '\n')
+ {
+ ungetc(c, stdin);
+ continue;
+ }
+ putchar('\r');
+ nspace = 0;
+ nlines = 0;
+ }
else if (c == '\n')
{
if (nlines < 2)
There's probably a way to write a state machine that handles CR, but that would be more complex.
I have a utod
program that converts Unix-style line endings to Windows-style; I used that in the pipeline to test the new variants of the code.