How to extract between to tokens in text file using bash and manipulate output

Question

I have a large text file with blocks of text between two tokens I want to extract and put into a new file. I want each block extracted to go on single line (each block has its own line).

I used this solution: Extract lines between 2 tokens in a text file using bash

sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile

and it worked almost perfectly. the problem I have is that the block extracted is two lines and I want to condense it to one line. How can I achieve this?

Example:

<token1>
text to
extract
<token2>
<token1>
text to
extract
<token2>

output should look like:

text to extract
text to extract

you can use `awk -v RS="" '{$1=$1} /./{print $0}' inputfile` if your tokens are really like as you mentioned in question. — P...., Nov 22 '16 at 06:52

Seephor · Accepted Answer · 2016-11-21T23:48:38.937

0

I was able to achieve this by separately running:

sed -e '/pattern/N;y/\n/\t/'

on my file after the first solution

edited Nov 21 '16 at 23:48

answered Nov 21 '16 at 23:17

Seephor

1,692
3
28
50

score 0 · Answer 2 · answered Nov 22 '16 at 03:48

awk is more suited for this sort of text processing than sed

$ cat ip.txt 
<token1>
text to
foo
extract
<token2>
<token1>
text to
extract
123
bar foo
baz
<token2>

$ awk '/<token1>/{f=1; next} /<token2>/{print a; a=""; f=0} f{a = a ? a" "$0 : $0}' ip.txt 
text to foo extract
text to extract 123 bar foo baz

Similar solution with perl

$ perl -lne 'if(/<token1>/){$f=1; next} if(/<token2>/){print "@a"; undef @a; $f=0}; push(@a,$_) if $f' ip.txt 
text to foo extract
text to extract 123 bar foo baz

or

$ perl -lne 'if(/<token1>/){$f=1; next} if(/<token2>/){print $a; $a=""; $f=0}; $a .= $a?" $_":$_ if $f' ip.txt 
text to foo extract
text to extract 123 bar foo baz

How to extract between to tokens in text file using bash and manipulate output

2 Answers2