How to use the java regex pattern matcher to just isolate the text Q170596
, I tried to do it on regexr.com but the escape characters don't correspond to the java.
This is the text I'm trying to parse:
<!-- wikibase-toolbar --><span class="wikibase-toolbar-container"><span class="wikibase-toolbar-item wikibase-toolbar ">[<span class="wikibase-toolbar-item wikibase-toolbar-button wikibase-toolbar-button-edit"><a href="/wiki/Special:SetSiteLink/Q170596">edit</a></span>]</span></span>
In order to dig out Q170596
, the rest can be thrown away.
I guess it would be something like this:
//this is not right
Pattern p = Pattern.compile("<!-- wikibase-toolbar --><span class=\"wikibase-toolbar-container\"><span class=\"wikibase-toolbar-item wikibase-toolbar \">[<span class=\"wikibase-toolbar-item wikibase-toolbar-button wikibase-toolbar-button-edit\"><a href=\"/wiki/Special:SetSiteLink/(.*?)\">edit<\/a><\/span>]<\/span><\/span>");
String line;
while ((line = br.readLine()) != null)
{
Matcher m = p.matcher(line);
if( m.matches() )
{
String first_part = m.group(1);
String thing_i_want = m.group(2);
String more_crap = m.group(3);
}
}
I was once told that using regex on html was not good style, is that right? But for this task I think it will work, isn't it?