0

Let's assume that i have file named inputFile which looks like that:

blahblah token substring token something else token substring2 token

Whole file contain only 1 long line.

I want to extract substrings between tokens with sed (substring,substring2).

At this moment I have:

[sed "s/^.* \?token\(.* \)token.* \?/\1/"][1] inputFile > outputFile

I try to do this based on these questions, but unfortunately it returns only last substring

Extract lines between 2 tokens in a text file using bash

How to replace multiple patterns at once with sed?

How to select lines between two patterns?

Answers with explanation will be great.

UPDATE real input code:

<archive><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>52333</text><sendTime>554</sendTime><deliveryTime>765</deliveryTime></message><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>4332</text><sendTime>764</sendTime><deliveryTime>922</deliveryTime></message></archive>

Expected output:

apr gtr 52333
apr gtr 4332
kvantour
  • 25,269
  • 4
  • 47
  • 72

1 Answers1

3

The problem is that sed is greedy so the above command will only return substring2 if you add the global flag (g) :

You could use awk for this where you redefine the fieldseparator FS to be the string token. This way your strings are on the even field positions :

$ echo "blahblah token substring token something else token substring2 token"  | \
  awk -F 'token' '{for(i=2;i<=NF;i+=2) {print $i}}'
 substring 
 substring2

update:

If your input is an xml-file you might want to do :

<archive>
   <message id="0">
       <receiver>apr</receiver>
       <sender>gtr</sender>
       <text>52333</text>
       <sendTime>554</sendTime>
       <deliveryTime>765</deliveryTime>
   </message>
   <message id="0">
       <receiver>apr</receiver>
       <sender>gtr</sender>
       <text>4332</text>
       <sendTime>764</sendTime>
       <deliveryTime>922</deliveryTime>
   </message>
 </archive>" 

leading to the cmd :

xmlstarlet sel -t -m '//message' -v receiver -o " " -v sender -o " " -v text -n <file>

which outputs

apr gtr 52333
apr gtr 4332
kvantour
  • 25,269
  • 4
  • 47
  • 72