0

I built an app using PlayFramework 2.3 and at some point I upload a CSV file and populate a database with it.

When accessing the app in local (127.0.0.1:9000) and doing the upload, everything works fine, the file is uploaded, parsed, and added in the database without any problem.

The same procedure is done in production but all the accented caracters are replaced with ��.

The main difference between the dev and the prod is :

  • In DEV, I access the app directly from PlayFramework (localserver)
  • In PROD, I access the app through NGinx, that redirets to a local instance of Play (Proxy).

Here's the detailled informations :

  • the CSV file is UTF-8 encoded (note: of course, it's the same file that I test)
  • The connection to the database is made using UTF-8 -> db.default.url="jdbc:mysql://127.0.0.1/2leadin?characterEncoding=UTF-8"
  • I tested (using Firefox), the HTML page is returned in UTF-8

Finally, here's my NGinx configuration :

proxy_buffering    off;
proxy_set_header   X-Real-IP $remote_addr;
proxy_set_header   X-Scheme "https";
proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header   Host $http_host;
proxy_http_version 1.1;

server {
        listen 80;
        server_name my.2lead.in;
        return      301 https://my.2lead.in;
}

server {
    listen               443;
    ssl                  on;
    root                 /var/www/2lead.in/errors/;

    # http://www.selfsignedcertificate.com/ is useful for development testing
    ssl_certificate      /ssl/2lead.crt;
    ssl_certificate_key  /ssl/2lead.key;

    # From https://bettercrypto.org/static/applied-crypto-hardening.pdf
    ssl_prefer_server_ciphers on;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # not possible to do exclusive
    ssl_ciphers 'EDH+CAMELLIA:EDH+aRSA:EECDH+aRSA+AESGCM:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:+CAMELLIA256:+AES256:+CAMELLIA128:+AES128:+SSLv3:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!DSS:!RC4:!SEED:!ECDSA:CAMELLIA256-SHA:AES256-SHA:CAMELLIA128-SHA:AES128-SHA';
    add_header Strict-Transport-Security max-age=15768000; # six months
    # use this only if all subdomains support HTTPS!
    # add_header Strict-Transport-Security "max-age=15768000; includeSubDomains"

    keepalive_timeout    70;
    server_name my.2lead.in;

    # remove the robots line if you want to use wordpress' virtual robots.txt
    location = /robots.txt  { access_log off; log_not_found off; }
    location = /favicon.ico { access_log off; log_not_found off; }

    location /public {
        alias /var/www/2lead.in/my/public/;
        access_log off;
        log_not_found off;
    }

    location / {
        proxy_pass  http://127.0.0.1:9100;
    }

    location ~ /\.git {
        deny all;
    }

    error_page 502 @maintenance;
    location @maintenance {
        rewrite ^(.*)$ /error502.html break;
    }
}

What I'm missing, do you have any ideas why I have the encoding issue only in PROD? I'm pretty sure it's because of NGinx, but I can't find the reason.

Thank you.

Cyril N.
  • 38,875
  • 36
  • 142
  • 243
  • Presumably your app is assuming the default charset in one of its file operations, and the default is different on the production server and your local machine (check the `file.encoding` env var; it's locale dependent.) There are a number of places where it's possible to omit an explicit charset and end up with these portability issues, but without seeing your code it's hard to know. I'd be surprised if it was NGinx. – Mikesname Feb 07 '15 at 19:28
  • I did a simple HelloWorld.java with `System.out.println(System.getProperty("file.encoding"));` in it. On local server, I had `UTF-8` has output, in prod, I had `ANSI_X3.4-1968`. I tried to force the encoding with `java -DFile.encoding=UTF-8 HelloWorld` with the same result (ANSI_X3.4-1968). Why? How? :/ – Cyril N. Feb 09 '15 at 09:57
  • Wait, it was a bad command line, with `java -Dfile.encoding=UTF-8 HelloWorld` (lower F/file), the output is UTF-8. I'll restart the app and see if this fixes the issue. – Cyril N. Feb 09 '15 at 09:59
  • Well, you nailled it @Mikesname, that was the issue! So you can answer to my question and I'll give you the Accepted answer. Thank you ! – Cyril N. Feb 10 '15 at 10:08

1 Answers1

1

The default Java character set is locale-dependent and taken from the file.encoding environment variable (see this answer). This can cause differences of behaviour on different machines, just like you are seeing. Two approaches to fix, the expedient way and the more robust and portable way:

  • make sure your server is running with -Dfile.encoding=UTF-8 (or whatever matches your dev environment)
  • ensure that all your file operations specify the charset explicitly, as this answer describes

In summary, relying on the default system encoding is fragile and should be avoided in most cases.

Community
  • 1
  • 1
Mikesname
  • 8,781
  • 2
  • 44
  • 57