The sed command sed -n '/pattern1/,/pattern2/p
does work to extract
lines between pattern1
and pattern2
inclusive if they are
located in the separate lines.
For instance, the following test code:
cat <<EOS | sed -n '/pattern1/,/pattern2/p'
foo
bar
pattern1
These lines
are printed.
pattern2
baz
EOS
outputs:
pattern1
These lines
are printed.
pattern2
However, the sed
command above does not work if the patterns are
located in the same line.
Moreover, the caret sign ^
and the dollar sign $
match the start
and end of the line respectively. They do not indicate the positions
of the substring to match.
Would you try the following instead:
(Needless to say I don't intend to parse XML files with sed
. This
is just a case study of substring extraction with sed
.)
sed -n "s/.*h3 align='center'>\([^<]*\)<\/h3.*/\1/p" thefile
The pattern .*h3 align='center'>\([^<]*\)<\/h3.*
matches with:
- A substring which includes
h3 align='center'
and any preceding
characters back to the start of the string.
- Followed by a series of any character excluding
<
.
- Followed by a substring which includes
</h3
and any trailing
characters up to the end of the line.
Then the s
(substitute) command replaces the matched pattern with
the second substring above. It works to extract
the second substring
from the matched line.
Let me go in detail about the second patten \([^<]*\)
.
- The character class
[^<]
matches any character other than <
.
- The concept
other than <
is necessary to anchor the pattern matching
just before the following substring </h3
. Otherwise the matching
may run over it for the next substring </h3
due to the nature of
greedy match
.
- The asterisk sign
*
is a quantifier to determine the number of repetitions
of the previous atom. In this case it matches a substring longer
than 0 composed of any character other than <
.
- The surrounding parens
\(
and \)
create capture group
and the
surrounded substring can be referred with \n
(where n is a number
in the order of appearance) as a replacement.
Hope this helps.