Don't use regular expressions for parsing HTML. Seriously, it's more complicated than you think.
If your document is actually XHTML, you can use XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(
"//@*["
+ "local-name()='class'"
+ " or local-name()='id'"
+ " or local-name()='for'"
+ " or local-name()='name'"
+ "]",
new InputSource(new StringReader(htmlContent)),
XPathConstants.NODESET);
int count = nodes.getLength();
for (int i = 0; i < count; i++) {
Collections.addAll(attributes,
nodes.item(i).getNodeValue().split("\\s+"));
}
If it's not XHTML, you can use Swing's HTML parsing:
HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {
private final Object[] attributesOfInterest = {
HTML.Attribute.CLASS,
HTML.Attribute.ID,
"for",
HTML.Attribute.NAME,
};
private void addAttributes(AttributeSet attr) {
for (Object a : attributesOfInterest) {
Object value = attr.getAttribute(a);
if (value != null) {
Collections.addAll(attributes,
value.toString().split("\\s+"));
}
}
}
@Override
public void handleStartTag(HTML.Tag tag,
MutableAttributeSet attr,
int pos) {
addAttributes(attr);
super.handleStartTag(tag, attr, pos);
}
@Override
public void handleSimpleTag(HTML.Tag tag,
MutableAttributeSet attr,
int pos) {
addAttributes(attr);
super.handleSimpleTag(tag, attr, pos);
}
};
HTMLDocument doc = (HTMLDocument)
new HTMLEditorKit().createDefaultDocument();
doc.getParser().parse(new StringReader(htmlContent), callback, true);
As for doing it without a loop, I don't think that's possible. But any implementation is going to use one or more loops internally anyway.