1

I have an endpoint which receives a MultipartFile.

Resource upload(@PathVariable Integer id, @RequestParam MultipartFile file) throws IOException {

This file usually is a .csv that I need to process every line and save the data.

But recently an user send a file with UTF-16 LE encoding and this adds a lot of strange characters in the data.

I'd like to receive the file with any encoding and always force to my acceptable encoding, for example, UTF-8, before process the file.

How can I do this?

Guilherme Bernardi
  • 490
  • 1
  • 6
  • 18
  • You can use custom CharacterEncodingFilter [Here's a similiar question with an answer](https://stackoverflow.com/questions/24054648/how-to-configure-characterencodingfilter-in-springboot) – mklepa Feb 13 '20 at 12:14
  • @mklepa maybe I misunderstood the other question, but I don't want to add a filter to the whole application. Is it possible with a filter? – Guilherme Bernardi Feb 13 '20 at 12:21
  • @mklepa No, I'm using Spring Boot 1.5.7, I tried the both two solutions in the question which you suggest to me and didn't work. – Guilherme Bernardi Feb 13 '20 at 13:13
  • How about this? https://stackoverflow.com/a/48569644/8370004 That way you force user to use certain encoding only in this endpoint. – mklepa Feb 13 '20 at 13:13
  • I'll try, I tried with the spring.http properties and also with the filter but didn't work. In my case I'm receiving the multipartfile and saving in a route and camel get the file and process the data. – Guilherme Bernardi Feb 13 '20 at 13:16
  • You're using Apache Camel? In that case, you probably should use a different way to create endpoints that the "usual" Spring controller class. Check this documentation https://camel.apache.org/manual/latest/faq/how-do-i-configure-endpoints.html I personally extended a RouteBuilder class and used from(String url) method, never had any issues with this. – mklepa Feb 13 '20 at 13:22
  • @mklepa but I think the problem is before the apache camel, because I get the MultipartFile and then I generate a File in a folder which camel is listening. But when I was generating this file the charset is already invalid. – Guilherme Bernardi Feb 13 '20 at 14:34
  • Ok, I just wanted to indicate that it better to create an endpoint using Camel instead of using Spring annotations. If you want to keep the files, then your solution might be better. I'll try to find a solution to your problem (fixing the encoding, but without creating a global filter). Please check if [this solution](https://stackoverflow.com/questions/5928046/spring-mvc-utf-8-encoding/48569644#48569644) works, because it seems to be clean and very simple to implement and test. – mklepa Feb 13 '20 at 15:41
  • @mklepa I already tried this one. It doesn't work. In my case is an upload and the encoding inside the file is wrong. – Guilherme Bernardi Feb 13 '20 at 15:58
  • I have another idea. You can create a workaround: after you receive MultipartFile object, check MultipartFile encoding and when it's not UTF-8 return meaningful response, for example 400 Bad Request, and a message about required encoding? If it's not possible to check MultipartFile encoding, then convert it to File and then you will be able to check it for sure. It's not the cleanest solution, but it should work. – mklepa Feb 13 '20 at 16:12

1 Answers1

0

After a few tests and search I found the solution.

To change the charset encode of a file I need to read and write the file applying the new target charset, but to create something generic which could receive any charset I need to identify the source charset.

To achieve that I add a dependency called UniversalDetector:

    <dependency>
        <groupId>com.github.albfernandez</groupId>
        <artifactId>juniversalchardet</artifactId>
        <version>2.3.1</version>
    </dependency>

Using it I could do this:

    encoding = UniversalDetector.detectCharset(file.getInputStream());
    if (encoding == null) {
        //throw exception
    }

And the method for transform the file:

   private static void encodeFileInLatinAlphabet(InputStream source, String fromEncoding, File target) throws IOException {
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(source, fromEncoding));
             BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target),
                     StandardCharsets.ISO_8859_1))) {
            char[] buffer = new char[16384];
            int read;
            while ((read = reader.read(buffer)) != -1)
                writer.write(buffer, 0, read);
        }
    }

So I could receive any charset and encode in the desired charset.

Note: In my case I always need the file in ISO_8859_1 so that why in the method is fixed, but you could receive the target charset as a parameter.

Guilherme Bernardi
  • 490
  • 1
  • 6
  • 18