How can I merge multiple blocks/lines with sed or regex?

Question

Is it possible to merge multiple blocks/lines into a "single" line? So basically if the next line starts with the same "#Msg" tag then append it to the previous line. (Hard to explain, but my example speaks for itself) (The blocks are separated by a new/blank line)

My input file looks like this:

#Msg,00000

#Msg,00001
#Msg,00002

#Msg,00003
#Msg,00004

#Msg,00005

#Msg,00006
#Msg,00007
#Msg,00008

#Msg,00009

#Msg,00010
#Msg,00011

Output should be like this:

#Msg,00000

#Msg,00001 #Msg,00002

#Msg,00003 #Msg,00004

#Msg,00005

#Msg,00006 #Msg,00007 #Msg,00008

#Msg,00009

#Msg,00010 #Msg,00011

Any advice is very welcome.

Are you specifically tied to `sed` here? Have you made any attempt to solve this yourself or done any research? — Mad Physicist, Dec 29 '17 at 22:19
I don't understand how the `Msg##` is used to group... In the example I see the groups being created based on whether there's a new line between them or not. Care to clarify a bit? — Savir, Dec 29 '17 at 22:24
Mostly I use regex, but I failed here, so I did some research and most people using sed or perl or awk ..so I'm NOT tied to sed. — vollschauer, Dec 29 '17 at 22:24
`awk -v RS="" '{for(i=1;i<=NF;i++){printf("%s ",$i)}print"\n"}' file` gets you part of the way there. I would add a pipe that deletes the blank lines. Good luck. — shellter, Dec 30 '17 at 03:53
Possible duplicate of [Sed to combine N text lines separated by blank lines?](https://stackoverflow.com/questions/39734125/sed-to-combine-n-text-lines-separated-by-blank-lines) — PesaThe, Dec 31 '17 at 12:20

score 0 · Accepted Answer · answered Dec 29 '17 at 22:36

This would be pretty easy to do in Perl:

perl -00 -ple 'tr/\n/ /'

-e CODE specifies the program.

-p wraps a read/write line loop around it (by default it reads from STDIN, but you can also specify one or more filenames on the command line).

-00 specifies that the input "lines" are actually paragraphs.

-l has two effects: Incoming line terminators are automatically stripped from lines, and outgoing lines get line terminators added to them (and because we used -00 (paragraph mode), our line terminator is actually \n\n).

To recap:

We read the input one paragraph at a time. For each paragraph, we remove any trailing newlines. We then translate every newline to a space. Finally we output the transformed paragraph, followed by \n\n.

score 0 · Answer 2 · answered Dec 29 '17 at 23:38

No point in trying to produce a shorter code than is possible with Perl!

Collect lines from the input file in list group until a blank line appears. Then output the contents of group, empty it and start again. When end-of-file is encountered output whatever is in group, if it is non-empty.

group = []
with open('vollschauer.txt') as vollschauer:
    for line in vollschauer:
        line = line.rstrip()
        if line:
            group.append(line)
        else:
            if group:
                print (' '.join(group))
                print()
                group = []
if group:
    print (' '.join(group))
    group = []

score 0 · Answer 3 · answered Dec 30 '17 at 14:00

0

$ awk -v RS= -v ORS='\n\n' '{$1=$1}1' file
#Msg,00000

#Msg,00001 #Msg,00002

#Msg,00003 #Msg,00004

#Msg,00005

#Msg,00006 #Msg,00007 #Msg,00008

#Msg,00009

#Msg,00010 #Msg,00011

answered Dec 30 '17 at 14:00

Ed Morton

188,023
17
78
185

PesaThe · Answer 4 · 2017-12-31T00:44:11.013

If you insist on using sed, this should do the trick:

sed -r ':a; N; /^(#[^,]+,).*\n\1/! { P; D }; s/\n/ /; ba' file

It takes different tags into account. Such tags won't be grouped together (that's what I understood is the desired behavior):

$ cat file
#Msg,00000
#Msg,00001
#Hello,00002

#Hello,00003
#What,00004
#What,00005
$ sed -r ':a; N; /^(#[^,]+,).*\n\1/! { P; D }; s/\n/ /; ba' file
#Msg,00000 #Msg,00001
#Hello,00002

#Hello,00003
#What,00004 #What,00005

Note that this solution uses GNU sed.

score 0 · Answer 5 · answered Nov 03 '19 at 12:18

This might work for you (GNU sed):

sed ':a;N;/^$/M!s/\n/ /;ta' file

Gather up lines, replacing each newline by a space until an empty line.

N.B. The use of the M flag on the repexp /^$/ which matches an empty line on a pattern space containing multiple lines.

How can I merge multiple blocks/lines with sed or regex?

5 Answers5