2

I'm trying to reformat some text by removing newline and duplicated space characters.

My input text looks like this:

     hello  ! hello 
you! 

 hello



   world!!! hello


   universe  !

and I'm trying to format it like this:

hello !
hello you!
hello world!
hello universe !

I tried using this command:

awk -v RS='!' '{gsub("^ *|\n", ""); gsub(" +", " ")} NF{print $0 RS }' file

But I still get some spaces at the beginning of the line:

 hello !
hello you!
 hello world!
hello universe !

I don't understand why the first gsub is not removing the leading space (that should be matched by the pattern ^ *).

What is wrong is this awk script?

I'm also interested in the sed command performing the same formatting.

oliv
  • 12,690
  • 25
  • 45
  • 1
    It's because `^` means the beginning of a record and there is a newline between the space and hello, shouldn't have a space in front of the first line though. Use `[[:space:]]` instead. – 123 Jul 07 '16 at 07:00

2 Answers2

3
$ awk -v RS='!' '{gsub(/^[[:space:]]*/, ""); gsub(/[[:space:]]+/, " ")} NF{print $0 RS}' file
hello !
hello you!
hello world!
hello universe !

How it works

  • -v RS='!'

    This sets the record separator to an exclamation point.

  • gsub(/^[[:space:]]*/, "")

    This removes all leading space.

    [[:space:]] is a unicode-safe way of matching any white space, which includes blanks, tabs, newlines, and some other more obscure white space.

  • gsub(/[[:space:]]+/, " ")

    This replaces any other multiple space with a single blank

  • NF{print $0 RS }

    If there are any words on this line, this prints them along with the record separator.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • The `{print $0 RS }` is still needed in order to get the `!` at the end. Anyway, your answer (with the `[[:space:]]`) is good. thanks. – oliv Jul 07 '16 at 07:11
  • maybe also update the output to match with the command (and add the 4 `!`). – oliv Jul 07 '16 at 07:20
1

In sed

Cmdline

sed ':1;/!/!{$!{N;b1}};s/!\{2,\}/!/;s/\n*//g;s/^ *//;s/ \{1,\}/ /g;s/!/&\n/;/^$/d;P;D' file

Script

:1
/!/!{
        $!{
                N
                b1
        }
}
s/!\{2,\}/!/
s/\n*//g
s/^ *//
s/ \{1,\}/ /g
s/!/&\n/
/^$/d
P
D
123
  • 10,778
  • 2
  • 22
  • 45
  • Can you please give some explanation? BTW, there is an extra newline at the end of the output. – oliv Jul 07 '16 at 07:29
  • @oliv Blank lines will be deleted now, regarding explanation it's pretty self explanatory if you look at what each command does. – 123 Jul 07 '16 at 07:32
  • hmm... _self explanatory_ ... ok for the `s` substitution. The `/!/!` part is a mystery for me... same for the `P;D` command... – oliv Jul 07 '16 at 07:43
  • `/!/` matches a single `!` in pattern space, `!` negates this match, meaning if line does not contain `!`, execute the following block/command. `P` and `D` are documented in the man page. – 123 Jul 07 '16 at 07:47