1

What would be the Java equivalent to :

def filt_out(s):
        return re.sub('<a href="(.*)">', '', s.replace('<br/>', '\n').replace('&quot;', '\"').replace('</a>', ''))

1 Answers1

8
public static String filtOut(String s) {
    return s.replaceAll("<a href=\"(.*)\">", "").replaceAll("<br/>", "\n").replaceAll("&quot;", "\"").replaceAll("</a>", "");
}

Though, such code style is not recommended as well as the approach in general. Usually, you should use special HTML parsers for processing HTML. Regular expressions are too limited for that task.

You can look the following questions on html parsers:

Community
  • 1
  • 1
Rorick
  • 8,857
  • 3
  • 32
  • 37
  • 2
    +1 Definitely recommend an HTML parser instead but would change this slightly to a non-greedy `(.*?)` – Jon Clements Oct 05 '12 at 11:27
  • 1
    Good point. There's a lot to be improved here, so I've just translated to Java as is =) I would change `\".*\"` to `\"[^\"]*\"` instead of reluctant quantifier since I often find that alternative quantifiers are hard to grasp for me and for my colleagues =) – Rorick Oct 05 '12 at 12:22