0

I would like to either use a grep command or just know the regex to get the following string between the ">" and "<" characters.

string :

<f id=mos-title>demo-break-1</f>

I would like to return

demo-break-1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223

3 Answers3

0

suppose file foo contains:

<f id=mos-title>demo-break-1</f>
<f id=mos-title>demo-break-2</f>
<f id=mos-title>demo-break-3</f>
<a>foo testing</a>

You could do something like this:

perl -ne 'print "$1\n" if /<.+id=mos-title>(.+?)<\/f>/' foo

Keep in mind that this would be strict as to having these matches only occur on one line. Also, you will have to account for any deviations in the format since this is not a valid HTML parser.

Here's a more relaxed approach as far as being strict, but still not 100% HTML compliant.

perl -ne 'print "$1\n" if /<.+id=mos-title\b.*?>\s*(.+?)\s*<\/f>/' foo

Output would be as follows:

demo-break-1
demo-break-2
demo-break-3
cmevoli
  • 351
  • 2
  • 6
0

If you have a proper xml document like this:

<root>
  <f id="mos-title">demo-break-1</f>
</root>

you can use a proper parser:

xmllint --xpath "/root/f[@id='mos-title']" input.xml | \
      sed 's/[^>]*>\([^<]*\)<[^>]*>/\1\n/g'

With the input you have, it you are sure that the input format is consistent (i.e., generated) you can use sed:

sed 's/[^>]*>\([^<]*\)<[^>]*>/\1/g' input
perreal
  • 94,503
  • 21
  • 155
  • 181
0

It is usually best to use an XML-parser, but you could try this awk:

awk '$1==s{print $2}' s="f id=mos-title" RS=\< FS=\> file
Scrutinizer
  • 9,608
  • 1
  • 21
  • 22