2

I am building a service using Spring MVC set up using Spring Boot where I want to be able to have arbitrary unicode characters in the URLs.

By looking around the web I end up with

@Configuration
@ComponentScan
@EnableAutoConfiguration
public class Main {
    public static void main(String... args) throws Exception {
        SpringApplication.run(Main.class, args);
    }

    @Bean
    public Filter characterEncodingFilter() {
        CharacterEncodingFilter characterEncodingFilter = new CharacterEncodingFilter();
        characterEncodingFilter.setEncoding("UTF-8");
        characterEncodingFilter.setForceEncoding(true);
        return characterEncodingFilter;
    }
}

and

@Controller
public class WordController {

    @RequestMapping(value="/word/{word}", method=RequestMethod.GET)
    public String greeting(Model model, @PathVariable("word") String word) {

        System.out.println(word);

        model.addAttribute("word", word);
        return "word";
    }

}

where the template "word" just prints out the word from the model.

When I start the server and enter http://localhost:8080/word/æøå into Chrome, the text printed on the response page (and in the terminal) is

æøå

which I think I recognize as a ISO-8859-1 interpretation the Danish letters æøå when they're actually encoded in UTF-8.

Looking into Chrome's net inspector I see that it actually queries http://localhost:8080/word/%C3%A6%C3%B8%C3%A52 which indeed seems to be the URL encoding of the string in UTF-8.

Can anyone explain why Spring doesn't parse the path variable as UTF-8 despite this configuration, and how to make it?

There seems to mixed opinions on whether CharacterEncodingFilter actually solves this problem. At least, in another (non-Boot) Spring project of mine I use web.xml to register the CharacterEncodingFilter. Here it's used successfully to make POST bodies parse as UTF-8, but I coudn't make it work for path variables there either.

This answer suggests that it should be configured in Tomcat's settings. If so, how is that done on an embedded server?

Community
  • 1
  • 1
bisgardo
  • 4,130
  • 4
  • 29
  • 38

1 Answers1

2

Following the event of a brain-wave, adding the bean method

@Bean
public EmbeddedServletContainerFactory servletContainer() {
    TomcatEmbeddedServletContainerFactory factory = new TomcatEmbeddedServletContainerFactory(8080);
    factory.addConnectorCustomizers(new TomcatConnectorCustomizer() {

        @Override
        public void customize(Connector connector) {
            connector.setURIEncoding("UTF-8");
        }
    });
    return factory;
}

seems to solve the problem.

Edit

The CharacterEncodingFilter is still necessary for converting POST bodies.

bisgardo
  • 4,130
  • 4
  • 29
  • 38
  • 1
    Would that be a sensible default for the rest of the world as well? If so it could be the default for spring boot. – Dave Syer Jan 04 '14 at 22:27
  • So do you need the filter and the connector change or only the connector? – Dave Syer Jan 05 '14 at 09:51
  • The filter should be removed to prevent double-converting POST bodies (it makes no difference regarding the path variable). Thanks for following up on this. – bisgardo Jan 05 '14 at 14:33
  • Oh wait, I seem to have been fooled by a double mis-encoding. Will update the answer shortly – bisgardo Jan 05 '14 at 14:50
  • Well, `CharacterEncodingFilter` is still needed as the edit now says. I was fooled by viewing the response of a `@ResponseBody` method as explained in [this follow-up question](http://stackoverflow.com/questions/20935969/make-responsebody-annotated-spring-boot-mvc-controller-methods-return-utf-8). It turned out that Spring parsed the body (which was really UTF-8) as ISO-8859-1, returned it as such, while Chrome viewed it as UTF-8, cancelling the error. *Sigh*. – bisgardo Jan 05 '14 at 16:40