How to extract multiple patterns between tokens at once with sed?

Question

Let's assume that i have file named inputFile which looks like that:

blahblah token substring token something else token substring2 token

Whole file contain only 1 long line.

I want to extract substrings between tokens with sed (substring,substring2).

At this moment I have:

[sed "s/^.* \?token\(.* \)token.* \?/\1/"][1] inputFile > outputFile

I try to do this based on these questions, but unfortunately it returns only last substring

Extract lines between 2 tokens in a text file using bash

How to replace multiple patterns at once with sed?

How to select lines between two patterns?

Answers with explanation will be great.

UPDATE real input code:

<archive><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>52333</text><sendTime>554</sendTime><deliveryTime>765</deliveryTime></message><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>4332</text><sendTime>764</sendTime><deliveryTime>922</deliveryTime></message></archive>

Expected output:

apr gtr 52333
apr gtr 4332

you'll have to add on what basis output is arrived at.. also, if it is valid xml, use an xml parser like xmlstarlet or a programming language with xml module.. — Sundeep, May 23 '18 at 10:09
@MariuszBakun I have updated my answer with an `xmlstarlet` cmd that provides the requested output. — kvantour, May 23 '18 at 10:20

kvantour · Accepted Answer · 2018-05-23T10:19:36.107

The problem is that sed is greedy so the above command will only return substring2 if you add the global flag (g) :

You could use awk for this where you redefine the fieldseparator FS to be the string token. This way your strings are on the even field positions :

$ echo "blahblah token substring token something else token substring2 token"  | \
  awk -F 'token' '{for(i=2;i<=NF;i+=2) {print $i}}'
 substring 
 substring2

update:

If your input is an xml-file you might want to do :

<archive>
   <message id="0">
       <receiver>apr</receiver>
       <sender>gtr</sender>
       <text>52333</text>
       <sendTime>554</sendTime>
       <deliveryTime>765</deliveryTime>
   </message>
   <message id="0">
       <receiver>apr</receiver>
       <sender>gtr</sender>
       <text>4332</text>
       <sendTime>764</sendTime>
       <deliveryTime>922</deliveryTime>
   </message>
 </archive>"

leading to the cmd :

xmlstarlet sel -t -m '//message' -v receiver -o " " -v sender -o " " -v text -n <file>

which outputs

apr gtr 52333
apr gtr 4332

How to extract multiple patterns between tokens at once with sed?

1 Answers1

Linked