0

I want a regex which removes a list of attributes from within the style attribute of a given html tag.

Ex : i want to remove height and cursor from span tag.

I/P:

String htmlFragment ="<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name="questions"> <b>hh</b></span>";

O/P

<span id="nav-askquestion" style="width:200px;" name="questions"><b>hh</b></span>

I have the following regex but it removes all occurrences height and cursor, not just inside div

String cleanString=htmlFragment.replaceAll("(height|cursor)[ ]*:[ ]*[^;]+;",""); 

Not looking to use html parser for this due to specific requirement.

Abhishek Ranjan
  • 911
  • 1
  • 14
  • 29
  • 6
    I will strongly suggest to not to use RegEx for this. You should look at the HTML/XML parsers for parsing the tags and data and then do the operations. – user2004685 Jan 13 '17 at 20:53
  • [See also](http://stackoverflow.com/a/1732454/1553851) – shmosel Jan 13 '17 at 20:53
  • To only replace that in a certain
    , you will have to make a RegEx search to find all the
    s, then inside those select which ones to modify, and then to modify them. You cannot use only one RegEx for this.
    – XMB5 Jan 13 '17 at 21:10
  • Even when you think a parsing case is too “simple” to worry about the consequences of using regular expressions, it often isn’t. See http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg . – VGR Jan 13 '17 at 21:43

3 Answers3

1

Try this regular expression:

\s*(height|cursor)\s*:\s*.+?\s*;\s*

You can test it out here.

If there are other attributes besides height and cursor, you want to capture, you can just keep adding bars between them (background-color|height|font-size) etc.

Lonnie Best
  • 9,936
  • 10
  • 57
  • 97
  • I wrote something similar , the issue is I want to remove some attributes from within the style attribute only . So if it's
    hello
    . Output should be
    – Abhishek Ranjan Jan 13 '17 at 22:15
  • Does java regex support positive look behinds? http://www.regular-expressions.info/lookaround.html – Lonnie Best Jan 13 '17 at 22:22
  • It does to some extent http://stackoverflow.com/questions/16340662/java-regex-positive-lookahead – Abhishek Ranjan Jan 13 '17 at 22:28
  • Well, then maybe you can add a positive look-behind that makes sure the match is preceded by style="[^"]+ (in other words a an unclosed quote of a style attribute). – Lonnie Best Jan 13 '17 at 22:31
1

I agree with others that it would be better to use HTML/XML parsers, which allow you to drill down to specific elements without worrying about any "accidental" regex matches.

However, having read Xlsx's comment, "You cannot use only one RegEx for this." I was compelled to post this solution using captured groups. This is purely for demonstration purposes only

String reg = "(<span.+)((height|cursor) *:[^;]+;)(.*)((height|cursor) *:[^;]+;)(.*)";

String cleanString=htmlFragment.replaceAll(reg, "$1$4$7"); 

Obviously, it is not pretty and it may still match on some HTML content (as opposed to tags), but it is possible. Unless this is intended as a quick fix, I urge you to use a more appropriate solution as suggested by others. One possible solution would be jsoup.

Community
  • 1
  • 1
Frelling
  • 3,287
  • 24
  • 27
0

As I said before, I will strongly suggest to not to use RegEx for this and make use of HTML/XML parsers for parsing the tags and data and then do all your operations.

But if you don't want to do that for some reason then I would suggest you to fallback to the basic sub-string based methods rather than using RegEx.

Here is a sample code snippet for the above situation:

public static void main(String[] args) {
    String htmlFragment = "<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name=\"questions\"> <b>hh</b></span>";
    int startIndex = htmlFragment.indexOf("<span");
    int stopIndex = htmlFragment.indexOf("</span>") + 7;

    /* Cursor */
    int cursorStart = htmlFragment.indexOf("cursor:", startIndex);
    int cursorEnd = htmlFragment.indexOf(";", cursorStart);
    htmlFragment = new StringBuilder()
            .append(htmlFragment.substring(startIndex, cursorStart))
            .append(htmlFragment.substring(cursorEnd + 1, stopIndex))
            .toString();

    /* Update Indices */
    stopIndex = htmlFragment.indexOf("</span>") + 7;

    /* Height */
    int heightStart = htmlFragment.indexOf("height:", startIndex);
    int heightEnd = htmlFragment.indexOf(";", heightStart);
    htmlFragment = new StringBuilder()
            .append(htmlFragment.substring(startIndex, heightStart))
            .append(htmlFragment.substring(heightEnd + 1, stopIndex))
            .toString();

    /* Output */
    System.out.println(htmlFragment);
}

I know it looks a bit messy but that's the only way I could think of.

user2004685
  • 9,548
  • 5
  • 37
  • 54
  • 1
    StringBuilder is overkill for the one-time concatenation of two strings. It provides no benefit while reducing readability. – VGR Jan 13 '17 at 21:40