3

I am looking for duplicate attributes within the code base, I threw an expression together that works, but I am wondering if it can be made any simpler or more logical.

Sample input

test.append("<td class='no-order' style='text-align:center;' class=\"data text\">");

My attempt

<([^>]*)(class=('|\\")[^('|\\")]+('|\\"))([^>]*)(class=('|\\")[^('|\\")]+('|\\"))([^>]*)>

My thinking was looking for a start tag < then anything that is not an end tag [^>]* followed by a class attribute with either ' or \" and then repeating the whole thing.

As you can see, even though it works, it looks quite long and complicated, is their a simpler way?

Edit:

super bonus brownie points for whoever writes it in the form of a replace all, so it combines the attribute values after running

epoch
  • 16,396
  • 4
  • 43
  • 71
  • you could use capturing groups.. check http://docs.oracle.com/javase/tutorial/essential/regex/groups.html – TheLostMind Apr 30 '14 at 14:11
  • the thing is a backreference matches _exactly_ the specified match, what if the attributes have different content? `class="hello", class="wow"` – epoch Apr 30 '14 at 14:13
  • are you really concerned about "hello" and "wow"?. Both are just names.. You just need class="...." right? – TheLostMind Apr 30 '14 at 14:15
  • consider using [jsoup](https://jsoup.org/) instead of regex. There are [many reasons](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) you should no use regex to parse html. – John Mercier Apr 25 '17 at 12:59
  • @JohnMercier, completely missing the point of the question... I have already accepted an answer. this was for a search and replace, not for parsing... – epoch May 04 '17 at 07:57

3 Answers3

4

You can use the following regex:

<.+(class)=("|').+?\2.+?\1.+>

Escape the regex before you use it.

And if it matches the string, then it contains duplicates. Else, it doesn't.

Explanation:

<.+(class)=("|') matches the < plus any characters till it reaches class= single or double quotes.

The rest of the regex matches the value only if the string contains class again somewhere along the line using backreference.

Amit Joki
  • 58,320
  • 7
  • 77
  • 95
1

Simply use class=("|') to check for the multiple class attributes.

Sample code:

    String str = "test.append(\"<td class='no-order' style='text-align:center;' class=\"data text\">\");";

    Pattern pattern = Pattern.compile("class=(\"|')");
    Matcher matcher = pattern.matcher(str);
    int index = 0;
    while (matcher.find()) {
        index++;
    }

    if (index > 1) {
        System.out.println("multiple class attribute found");
    }

output:

multiple class attribute found
Braj
  • 46,415
  • 5
  • 60
  • 76
0

To build on what Amit Joki suggested, if you want to make sure it's in the same element you could use:

<.+(class)=("|').+?\2[^>]+?\1.+>

The addition of [^>] will make sure your second class attribute will reside in the same element as it will match anything except the closing of the tag.

tmgardne
  • 1
  • 1