I'm trying to use Regex in Java for the first time. I want to get some parts of a string. The string is a little complex:
<description>
<a href='http://testlink.html' alt='some text'><img border='0'
src='http://s2.glbimg.com/zzag70iNYX-QK24sUp0YXQmmXhx7yb8j2Sq2YK7tvX3A6vCwEUOFnFTBONQFT-
ni/s.glbimg.com/es/ge/f/original/2012/04/25/image.jpg'
alt='some' title='text' /></a><br />some text; some text
</description>
My needs is to get the strings that lies in href and alt. For this I'm doing this code:
for(Element element : elements)
{
//Elements children = element.children();
Pattern pattern = Pattern.compile("a\\bhref=*(.html|.htm)>");
String[] data = pattern.split(element.text()); ...
}
And so on. At the moment I'm trying to get only href without success. The return is always the whole string. Isn't correct? I've put the html extension to guarantee and nothing occurs.