0

I have a large text file with blocks of text between two tokens I want to extract and put into a new file. I want each block extracted to go on single line (each block has its own line).

I used this solution: Extract lines between 2 tokens in a text file using bash

sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile

and it worked almost perfectly. the problem I have is that the block extracted is two lines and I want to condense it to one line. How can I achieve this?

Example:

<token1>
text to
extract
<token2>
<token1>
text to
extract
<token2>

output should look like:

text to extract
text to extract
Community
  • 1
  • 1
Seephor
  • 1,692
  • 3
  • 28
  • 50
  • you can use `awk -v RS="" '{$1=$1} /./{print $0}' inputfile` if your tokens are really like as you mentioned in question. – P.... Nov 22 '16 at 06:52

2 Answers2

0

I was able to achieve this by separately running:

sed -e '/pattern/N;y/\n/\t/'

on my file after the first solution

Seephor
  • 1,692
  • 3
  • 28
  • 50
0

awk is more suited for this sort of text processing than sed

$ cat ip.txt 
<token1>
text to
foo
extract
<token2>
<token1>
text to
extract
123
bar foo
baz
<token2>

$ awk '/<token1>/{f=1; next} /<token2>/{print a; a=""; f=0} f{a = a ? a" "$0 : $0}' ip.txt 
text to foo extract
text to extract 123 bar foo baz


Similar solution with perl

$ perl -lne 'if(/<token1>/){$f=1; next} if(/<token2>/){print "@a"; undef @a; $f=0}; push(@a,$_) if $f' ip.txt 
text to foo extract
text to extract 123 bar foo baz

or

$ perl -lne 'if(/<token1>/){$f=1; next} if(/<token2>/){print $a; $a=""; $f=0}; $a .= $a?" $_":$_ if $f' ip.txt 
text to foo extract
text to extract 123 bar foo baz
Sundeep
  • 23,246
  • 2
  • 28
  • 103