AWK or sed way to paste non-adjacent lines

Question

$ cat file
aaa bbb ccc
ddd eee
jjj kkk lll
mmm
nnn ooo ppp

The following AWK command will paste the 'mmm' line at the end of the 'ddd eee' line. Is there a simpler way to do this using AWK or sed?

$ awk 'FNR==NR {if (NR==4) foo=$0; next} FNR==2 {print $0" "foo; next} FNR==4 {next} 1' file file
aaa bbb ccc
ddd eee mmm
jjj kkk lll
nnn ooo ppp

To clarify: I want to paste line 4 at the end of line 2 in this particular file, with a single space between the 'ddd eee' and the 'mmm'. That's the task. Is there an AWK or sed solution that's simpler than the one I came up with?

How do you know when the `mmm` line is needed? How do you know when it will appear? Are you trying to make sure the same number of words appear on each line? Will the number of words always be three? What would happen if the line contained `mmm zzz` instead of just `mmm`? What if there were a couple lines with `ttt` and `uuu` at the end? Should there be `ddd eee ttt` and `mmm zzz uuu`? Etc? The question is, as yet, woefully under-specificed and hence not sensibly answerable. — Jonathan Leffler, Aug 28 '16 at 04:15

kdhp · Accepted Answer · 2017-09-26T22:41:28.473

This can be done in sed using the hold space:

sed '2{N;h;N;s/\n.*\n/ /;p;g;D;}' file

2{...} Run the enclosed commands on line two.
N;h;N Read next two lines into the pattern space, holding the first two.
s/\n.*\n/ / Substitute a space for the middle line.
p;g;D Print the pasted lines, load the hold space, and delete the first line (leaving the one that was removed by the previous substitute).

or using captures (\(...\)) & back-references (\1, \2, etc.):

sed '2{N;N;s/\(\n.*\)\n\(.*\)/ \2\1/;}' file

2{...} Run the enclosed commands on line two.
N;N Read next two lines into the pattern space.
s/\(\n.*\)\n\(.*\)/ \2\1/ Swap the third and fourth line, joining the first and third lines.
- \(\n.*\) Capture the third line, including the leading newline.
- \n\(.*\) Capture the fourth line, excluding the leading newline.
- / \2\1/ Replace the matched portion (the third & fourth lines) with a space, followed by the second, and then the first capture groups.

The second version is very nice. I have GNU sed, so this works: sed -r '2{N;N;s/(\n.*)\n(.*)/ \2\1/;}' file — user2138595, Aug 28 '16 at 08:10
You can't seriously think this is simpler than the script you started with??? Remember you asked for simpler, not briefer. — Ed Morton, Aug 28 '16 at 16:49

score 2 · Answer 2 · answered Aug 28 '16 at 06:13

2

This meets the letter of the amended problem statement — it prints line 1, appends line 4 after the content of line 2 as line 2, then prints line 3, and then prints line 5 and beyond:

awk 'NR == 1 || NR >= 5 { print; next }
     NR == 2 { save2 = $0 }
     NR == 3 { save3 = $0 }
     NR == 4 { print save2, $0; print save3 }' file

It's simpler than the code in the question in that it only scans the file once.

The output:

aaa bbb ccc
ddd eee mmm
jjj kkk lll
nnn ooo ppp

answered Aug 28 '16 at 06:13

Jonathan Leffler

730,956
141
904
1,278

The word `simpler` is interesting. I don't find this simpler than parsing the file twice (more efficient, yes, but simpler no) and I find the [accepted answer of sed runes](http://stackoverflow.com/a/39188491/1745001) to be much more complicated. I guess `simpler` is in the eye of the beholder... – Ed Morton Aug 28 '16 at 16:48
The problem statement renders most discussion of the merits of solutions irrelevant. It is a bad (weird, completely ungeneralized) set of requirements. It is likely an [XY Problem](http://mywiki.wooledge.org/XyProblem) that may have been over-simplified (too much of an MCVE). As you say, simplicity is in the eye of the beholder. Your solution is more compact; I don't find it simpler. – Jonathan Leffler Aug 28 '16 at 17:33
I agree, poorly specified requirements and I understand, eye of the beholder. For me when you talk about requiring code to be "simple" it has to consider ease of maintenance/enhancement since if you're never going to look at it again who cares how "simple" it is. In this case what if in future instead of lines 2 and 4 we had to combine lines 20 and 40. With the 2-pass solution the OP originally posted and with the modified version I posted you'd just change the numbers 2 to 20 and 4 to 40 while with yours and the sed solution it'd be a lot of added code or a rewrite. – Ed Morton Aug 28 '16 at 17:47

score 0 · Answer 3 · answered Aug 28 '16 at 05:51

0

Solution in TXR:

$ txr -c '@line1
@line2
@line3
@line4
@(data rest)
@(output)
@line1
@line2 @line4
@line3
@  (repeat)
@  rest
@  (end)
@(end)' file
aaa bbb ccc
ddd eee mmm
jjj kkk lll
nnn ooo ppp

answered Aug 28 '16 at 05:51

Kaz

55,781
9
100
149

Ed Morton · Answer 4 · 2016-08-28T17:53:37.663

0

This is simpler:

$ awk 'FNR==NR {if (NR==4) foo=$0; next} FNR==2{$0=$0" "foo} FNR!=4' file file
aaa bbb ccc
ddd eee mmm
jjj kkk lll
nnn ooo ppp

Other solutions might be faster or use less memory but they won't be simpler.

edited Aug 28 '16 at 17:53

answered Aug 28 '16 at 17:27

Ed Morton

188,023
17
78
185

AWK or sed way to paste non-adjacent lines

4 Answers4

Linked