Python's re.sub() -> Java

Question

What would be the Java equivalent to :

def filt_out(s):
        return re.sub('<a href="(.*)">', '', s.replace('<br/>', '\n').replace('&quot;', '\"').replace('</a>', ''))

I'd also recommend using an actual HTML parser for processing HTML — millimoose, Oct 05 '12 at 11:01
[Use an XML parser](http://stackoverflow.com/a/1732454/647772) — , Oct 05 '12 at 11:10
At the *very* least, use `.*?` instead of `.*`. Otherwise, you'll get problems when you have more than one anchor tag on one line in your HTML file. — Tim Pietzcker, Oct 05 '12 at 11:14

score 8 · Answer 1 · edited May 23 '17 at 12:27

8

public static String filtOut(String s) {
    return s.replaceAll("<a href=\"(.*)\">", "").replaceAll("<br/>", "\n").replaceAll("&quot;", "\"").replaceAll("</a>", "");
}

Though, such code style is not recommended as well as the approach in general. Usually, you should use special HTML parsers for processing HTML. Regular expressions are too limited for that task.

You can look the following questions on html parsers:

edited May 23 '17 at 12:27

Community

1
1

answered Oct 05 '12 at 10:58

Rorick

8,857
3
32
37

2

+1 Definitely recommend an HTML parser instead but would change this slightly to a non-greedy `(.*?)` – Jon Clements Oct 05 '12 at 11:27
1

Good point. There's a lot to be improved here, so I've just translated to Java as is =) I would change `\".*\"` to `\"[^\"]*\"` instead of reluctant quantifier since I often find that alternative quantifiers are hard to grasp for me and for my colleagues =) – Rorick Oct 05 '12 at 12:22

Python's re.sub() -> Java

1 Answers1