0

I am parsing some name and values tag from a HTML page using Regex. However my regex is not returning all the required values.. Below is the snippet of html page-

<input style="display: none;" name="hiddenAction" value="myval" type="hidden">
<input name="ml_uiss" id="ml_uiss" value="aba972kd82lw" type="hidden">
<input style="display: none;" name="Key" id="Key" value="56n8f48jfn98cwnc38c398nc83nx2b9c32n.an24" type="text">
<input name="AvKbkGPQr" class="iswickEnabled input" maxlength="10" id="AvKbkGPQr" onkeyup="javascript:checkIt(this);" onkeydown="javascript:checkIt(this);" onchange="javascript:checkIt(this);" value="1234567890" onfocus="this.value='';" type="text"> <input name="PjbkAPker" class="iswickEnabled input" maxlength="10" id="PjbkAPker" onkeyup="javascript:checkIt(this);" onkeydown="javascript:checkIt(this);" onchange="javascript:checkIt(this);" type="text"> 
<input id="timeCheck" name="timeCheck" value="23:38:20" type="hidden">
<input name="isDone" id="isDone" value="prq" type="hidden">

Below is the code with regex-

String reg = "<input.*name=['\"](\\w+)['\"].*\\svalue=['\"]([\\w:.\\s]+)['\"].*(<input name=\"(\\w+)\")?";
Pattern p = Pattern.compile(reg);
Matcher m = p.matcher(myString);
while (m.find()) {
    String match1 = m.group(1);
    String match2 = m.group(2);
    String match3 = m.group(3);
    String match4 = m.group(4);
    System.out.println("[" + match1 + "][" + match2 + "][" + match3+ "][" + match4 + "]");
}

The output is below-

[hiddenAction][myval][null][null]
[ml_uiss][aba972kd82lw][null][null]
[Key][56n8f48jfn98cwnc38c398nc83nx2b9c32n.an24][null][null]
[AvKbkGPQr][1234567890][null][null]
[timeCheck][23:38:20][null][null]
[isDone][prq][null][null]

In the 4th line of HTML content, it is having two input name tag, because of which, this regex is not picking the 2nd input name which is PjbkAPker (This is missing in output). Rest of the things are fine.
I want to get the second input name also.

ravi
  • 6,140
  • 18
  • 77
  • 154

1 Answers1

2

Parsing X/HTML with regular expressions is a bad ideaTM.

Try using jsoup instead:

Document doc = Jsoup.parseBodyFragment(htmlString);
Elements inputs = doc.select("input");
for (Element el : inputs) {
  Attributes attrs = el.attributes();
  System.out.print("ELEMENT: " + el.tagName());
  for (Attribute attr : attrs) {
    System.out.print(" " + attr.getKey() + "=" + attr.getValue());
  }
  System.out.println();
}
Community
  • 1
  • 1
maerics
  • 151,642
  • 46
  • 269
  • 291