You can give Regex a try.
Try this regex >\s+(.*?)\s+<'
.
Please keep one thing in mind the regex solution will only work if you have extracted this string
< a: href > https://0.0.0.1/abcd/openthis.pdf < /a: href>
In general use html parsers
to extract the text from the corresponding html code.
Here is a reason why you should not parse HTML with regex.
I would give htmlcleaner a try.
HTMLCleaner is Java library used to safely parse and transform any HTML found on web to well-formed XML. It is designed to be small, fast, flexible and independant. HtmlCleaner may be used in java code, as command line tool or as Ant task. Result of parsing is lightweight document object model which can easily be transformed to standards like DOM or JDom, or serialized to XML output in various ways (compact, pretty printed and so on).
You can use XPath
with htmlcleaner to get contents within xml/html tags.Here is a nice
example Xpath Example