I would like to either use a grep
command or just know the regex
to get the following string between the ">" and "<" characters.
string :
<f id=mos-title>demo-break-1</f>
I would like to return
demo-break-1
I would like to either use a grep
command or just know the regex
to get the following string between the ">" and "<" characters.
string :
<f id=mos-title>demo-break-1</f>
I would like to return
demo-break-1
suppose file foo
contains:
<f id=mos-title>demo-break-1</f>
<f id=mos-title>demo-break-2</f>
<f id=mos-title>demo-break-3</f>
<a>foo testing</a>
You could do something like this:
perl -ne 'print "$1\n" if /<.+id=mos-title>(.+?)<\/f>/' foo
Keep in mind that this would be strict as to having these matches only occur on one line. Also, you will have to account for any deviations in the format since this is not a valid HTML parser.
Here's a more relaxed approach as far as being strict, but still not 100% HTML compliant.
perl -ne 'print "$1\n" if /<.+id=mos-title\b.*?>\s*(.+?)\s*<\/f>/' foo
Output would be as follows:
demo-break-1
demo-break-2
demo-break-3
If you have a proper xml document like this:
<root>
<f id="mos-title">demo-break-1</f>
</root>
you can use a proper parser:
xmllint --xpath "/root/f[@id='mos-title']" input.xml | \
sed 's/[^>]*>\([^<]*\)<[^>]*>/\1\n/g'
With the input you have, it you are sure that the input format is consistent (i.e., generated) you can use sed:
sed 's/[^>]*>\([^<]*\)<[^>]*>/\1/g' input
It is usually best to use an XML-parser, but you could try this awk:
awk '$1==s{print $2}' s="f id=mos-title" RS=\< FS=\> file