Matching patterns across lines

Question

Suppose I have a file which contains:

something  
line=1  
file=2  
other  
lines  
ignore  

something  
line=2  
file=3  
other  
lines  
ignore

Eventually, I want a unique list of the line and file combinations in each section. In the first stage I am trying to get sed to output just those lines combined into one line, like

line=1file=2  
line=2file=3

Then I can use sort and uniq.

So I am trying

sed -n -r 's/(line=)(.*?)(\r)(file=)(.*?)(\r)/\1\2\4\5/p' sample.txt

(It isn't necessarily just a number after each)

But it won't match across the lines. I have tried \n and \r\n but it doesn't seem to be the style of new line, since:

sed -n -r 's/(line=)(.*?)(\r)/\1\2/p' sample.txt

Will output the "line=" lines, but I just can't get it to span the new line, and collect the second line as well.

by default, sed will operate only line by line... so you can never match across multiple lines.. some sed implementations support `-z` option which will make sed operate on chunks separated by ASCII NUL character instead of newline character.. there are also sed commands like `n`, `N`, etc which you can use.. — Sundeep, Jan 21 '19 at 09:11
also, by `.*?` you might be expecting non-greedy matching - not supported at all by sed, you can use perl instead and perl has `-0777` option to slurp entire input as a single string — Sundeep, Jan 21 '19 at 09:13
will `line=` and `file=` strings will always be in consecutive lines in that order? does this work for you? `sed -n '/line=/{N;s/\n//p}' sample.txt` — Sundeep, Jan 21 '19 at 09:19
Yes, lines will always be in that order. Thanks - it just outputs the "file=" lines for me. — andy1749313, Jan 21 '19 at 09:21
in that case, your input is likely to have dos style line ending `\r\n`, either convert the file to unix style first(https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it) or use `sed -n '/line=/{N;s/\r\n//p}'` — Sundeep, Jan 21 '19 at 09:28
Brilliant - thanks! If you want to post as an answer that is it for me. — andy1749313, Jan 21 '19 at 09:47

score 0 · Accepted Answer · answered Jan 21 '19 at 09:53

By default, sed will operate only on chunks separated by \n character, so you can never match across multiple lines. Some sed implementations support -z option which will make it to operate on chunks separated by ASCII NUL character instead of newline character (this could work for small files, assuming NUL character won't affect the pattern you want to match)

There are also some sed commands that can be used for multiline processing

sed -n '/line=/{N;s/\n//p}'

N command will add the next line to current chunk being processed (which has to match line= in this case)
s/\n//p then delete the newline character, so that you get the output as single line

If your input has dos style line ending, first convert it to unix style (see Why does my tool output overwrite itself and how do I fix it?) or take care of \r as well

sed -n '/line=/{N;s/\r\n//p}'

Note that these commands were tested on GNU sed, syntax may vary for other implementations

Matching patterns across lines

1 Answers1