extract string using java StringTokenizer, split or scanner

Question

I want to extract the string between <a: href> and </a: href> from the following:

<a: href> https://0.0.0.1/abcd/openthis.pdf </a: href>

using StringTokenizer, split or scanner.
I'm trying to use StringTokenizer with <a: href> and </a: href> as delimiters but its not working. I tried to escape <, > and :, but this doesn't seem to be the problem. My guess is that it won't accept a word or a phrase as a delimiter.

Use an HTML parser, that's what they're for. Also consider searching this site before asking this as this same question gets asked just about every other day. — Hovercraft Full Of Eels, Jan 24 '12 at 01:28

score 0 · Answer 1 · edited May 23 '17 at 10:25

0

You can give Regex a try.

Try this regex >\s+(.*?)\s+<'.

Please keep one thing in mind the regex solution will only work if you have extracted this string

< a: href > https://0.0.0.1/abcd/openthis.pdf < /a: href>

In general use html parsers to extract the text from the corresponding html code.

Here is a reason why you should not parse HTML with regex.

I would give htmlcleaner a try.

HTMLCleaner is Java library used to safely parse and transform any HTML found on web to well-formed XML. It is designed to be small, fast, flexible and independant. HtmlCleaner may be used in java code, as command line tool or as Ant task. Result of parsing is lightweight document object model which can easily be transformed to standards like DOM or JDom, or serialized to XML output in various ways (compact, pretty printed and so on).

You can use XPath with htmlcleaner to get contents within xml/html tags.Here is a nice
example Xpath Example

edited May 23 '17 at 10:25

Community

1
1

answered Jan 24 '12 at 01:31

RanRag

48,359
38
114
167

2

Re: use a regex: please see my [favorite answer ever](http://stackoverflow.com/a/1732454/576139) – Chris Eberle Jan 24 '12 at 01:42
@Chris: that answer is pure poetry. I cried. :) – Hovercraft Full Of Eels Jan 24 '12 at 01:47
I think his answer was better. It beats any answer I've given. – Hovercraft Full Of Eels Jan 24 '12 at 01:51
I agree to that. But I atleast deserve a single upvote for my effort :P . – RanRag Jan 24 '12 at 01:52

extract string using java StringTokenizer, split or scanner

1 Answers1