1

I need a Java Pattern to fetch all the attributes of an HTML element which would be passed as String. Initially, I had Used to split them using the following Pattern

private static Pattern scriptletvalue1 = Pattern.compile("value=\"([^\"]*)\"");
private static Pattern scriptletvalue2 = Pattern.compile("value='([^']*)'");
private static Pattern scriptletId1 = Pattern.compile("id=\"([^\"]*)\"");
private static Pattern scriptletId2 = Pattern.compile("id='([^\']*)'");

and so on for all the attributes, This would work fine until there are no double quotes inside the attribute values. But considering a scriptlet inside the attribute value which would be calling functions might have parameters with double quotes and that is where the above-mentioned patterns fail.

So for an attribute

<div value="<%=AnyText%>"></div>

The first pattern would give me <%=AnyText%>

But when I use the same pattern for

<div value="<%=myFunction.getValue("some Key")%>"></div>

the pattern would return <%=myFunction.getValue( instead of <%=myFunction.getValue("some Key")%>

How to fix it?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Aravind
  • 61
  • 7
  • Which one do you really want - `A method to extract all attributes of a HTML element` Or `Java RegEx pattern to extract all attributes of a HTML element` ? – sarveshseri May 21 '18 at 10:57
  • 2
    Don't use regex to process HTML: https://stackoverflow.com/a/1732454/2610466 – Krypt1 May 21 '18 at 10:58
  • @SarveshKumarSingh, A java pattern to pick the entire attribute.I'll just specify the attribute name in the pattern itself – Aravind May 21 '18 at 11:12
  • @Krypt1, I'm not parsing the processed HTML. I have a JSP file and that is where I am parsing – Aravind May 21 '18 at 11:15
  • There is no such thing as `Java pattern`, you are using `Regular Expressions`. Now, the question is - why do you specifically want to do it with `Regular Expressions` and not something else? – sarveshseri May 21 '18 at 11:34
  • @SarveshKumarSingh, I am talking about java.util.regex.Pattern. It's not regex precisely right ?. Isn't there a small difference and if regex isn't the best option I'm open to suggestions. – Aravind May 21 '18 at 11:49
  • It is RegEx. And if you are open for alternatives then use any html parser library. Example - https://jsoup.org/cookbook/input/parse-body-fragment – sarveshseri May 21 '18 at 11:52
  • That helps , Thanks @SarveshKumarSingh. – Aravind May 21 '18 at 11:58

0 Answers0