11

I'm hoping someone has already written this:

A servlet filter that can be configured with regular expression search/replace patterns and applies them to the HTML output.

Does such a thing exist?

Jeremy Stein
  • 19,171
  • 16
  • 68
  • 83
  • What exactly do you want to change? The request URL or response body? Tuckey's UrlRewriteFilter is excellent, but it is intented to rewrite URL's (like as possible with well known Apache HTTPD's RewriteRule). To change the response body, you'll have to be more specific about the functional requirement. No such filter comes to mind, but this smells too much like sanitizing user-controlled input to prevent XSS. In such case, regex is absolutely the wrong tool for the job. – BalusC Feb 16 '11 at 00:12
  • I'm sorry I was unclear. I've edited the question to indicate that I want to modify the HTML output. – Jeremy Stein Feb 17 '11 at 13:59
  • What exactly in the HTML output? Since using regex to parse and modify HTML is an extremely poor practice, no such filter was ever written. Please clarify the functional requirement more. Why would you need a filter for this? Why not just make changes straight in the view side? Etc. – BalusC Feb 17 '11 at 15:14
  • We have want to incorporate a vendor's JSP-based web application into our own through frames. We need to removed every `target="_parent"` from their output. They gave us only the compiled JSPs. I think the easiest way to make the change is to add a filter that modifies the output. – Jeremy Stein Feb 18 '11 at 18:47

3 Answers3

15

I couldn't find one, so I wrote one:

RegexFilter.java

package com.example;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;

/**
 * Applies search and replace patterns. To initialize this filter, the
 * param-names should be "search1", "replace1", "search2", "replace2", etc.
 */
public final class RegexFilter implements Filter {
    private List<Pattern> searchPatterns;
    private List<String> replaceStrings;

    /**
     * Finds the search and replace strings in the configuration file. Looks for
     * matching searchX and replaceX parameters.
     */
    public void init(FilterConfig filterConfig) {
        Map<String, String> patternMap = new HashMap<String, String>();

        // Walk through the parameters to find those whose names start with
        // search
        Enumeration<String> names = (Enumeration<String>) filterConfig.getInitParameterNames();
        while (names.hasMoreElements()) {
            String name = names.nextElement();
            if (name.startsWith("search")) {
                patternMap.put(name.substring(6), filterConfig.getInitParameter(name));
            }
        }
        this.searchPatterns = new ArrayList<Pattern>(patternMap.size());
        this.replaceStrings = new ArrayList<String>(patternMap.size());

        // Walk through the parameters again to find the matching replace params
        names = (Enumeration<String>) filterConfig.getInitParameterNames();
        while (names.hasMoreElements()) {
            String name = names.nextElement();
            if (name.startsWith("replace")) {
                String searchString = patternMap.get(name.substring(7));
                if (searchString != null) {
                    this.searchPatterns.add(Pattern.compile(searchString));
                    this.replaceStrings.add(filterConfig.getInitParameter(name));
                }
            }
        }
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        // Wrap the response in a wrapper so we can get at the text after calling the next filter
        PrintWriter out = response.getWriter();
        CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
        chain.doFilter(request, wrapper);

        // Extract the text from the completed servlet and apply the regexes
        String modifiedHtml = wrapper.toString();
        for (int i = 0; i < this.searchPatterns.size(); i++) {
            modifiedHtml = this.searchPatterns.get(i).matcher(modifiedHtml).replaceAll(this.replaceStrings.get(i));
        }

        // Write our modified text to the real response
        response.setContentLength(modifiedHtml.getBytes().length);
        out.write(modifiedHtml);
        out.close();
    }

    public void destroy() {
        this.searchPatterns = null;
        this.replaceStrings = null;
    }
}

CharResponseWrapper.java

package com.example;

import java.io.CharArrayWriter;
import java.io.PrintWriter;

import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

/**
 * Wraps the response object to capture the text written to it.
 */
public class CharResponseWrapper extends HttpServletResponseWrapper {
    private CharArrayWriter output;

    public CharResponseWrapper(HttpServletResponse response) {
        super(response);
        this.output = new CharArrayWriter();
    }

    public String toString() {
        return output.toString();
    }

    public PrintWriter getWriter() {
        return new PrintWriter(output);
    }
}

Example web.xml

<web-app>
    <filter>
      <filter-name>RegexFilter</filter-name>
      <filter-class>com.example.RegexFilter</filter-class>
      <init-param><param-name>search1</param-name><param-value><![CDATA[(<\s*a\s[^>]*)(?<=\s)target\s*=\s*(?:'_parent'|"_parent"|_parent|'_top'|"_top"|_top)]]></param-value></init-param>
      <init-param><param-name>replace1</param-name><param-value>$1</param-value></init-param>
    </filter>
    <filter-mapping>
      <filter-name>RegexFilter</filter-name>
      <url-pattern>/*</url-pattern>
    </filter-mapping>
</web-app>
Jeremy Stein
  • 19,171
  • 16
  • 68
  • 83
  • Awesome stuff, just used this to help me solve a similar issue! – Aaron Silverman Jun 12 '12 at 18:59
  • 1
    I would recommend an out.flush() before the out.close() to prevent errors like these: java.net.ProtocolException: Didn't meet stated Content-Length, wrote: '27026' bytes instead of stated: '27023' bytes. – rudolfv Apr 04 '14 at 14:02
5

I am not sure if this is what looking for, but there is a URL rewrite filter. It supports regex. Please see here http://www.tuckey.org/urlrewrite/

Hope this helps.

Nishant
  • 54,584
  • 13
  • 112
  • 127
  • This library supports not only the rewriting of incoming URLs but also the modification of links on the HTML page: http://urlrewritefilter.googlecode.com/svn/trunk/src/doc/manual/4.0/index.html#outbound-rule Nice. – rwitzel Jan 28 '13 at 08:39
2

SiteMesh is popular for this type of work.


SiteMesh has moved into a standalone Project: http://www.sitemesh.org/

Mindwin Remember Monica
  • 1,469
  • 2
  • 20
  • 35
Uriah Carpenter
  • 6,656
  • 32
  • 28