I am working on a perl + Mojolicious web application and my front-end send a POST query containing accents in a "a"
parameter ("été"
) using charset utf-8
as I can spy in chrome network tab. But server side script decode that parameter using a charset that I didn't expect.
I wrote the following script to reproduce that case.
use utf8; #script encoded in utf8 without bom
use Mojolicious::Lite;
use Data::HexDump;
{
require Mojolicious;
say "perl $^V, Mojolicious: v", Mojolicious->VERSION, ", ", `chcp` ;
}
post '/' => sub{
my $self = shift;
my $params = $self->req->params->to_hash;
app->log->debug("received data:\n", HexDump( $params->{a} ) );
use Devel::Peek;
Dump( $params->{a} );
$self->render( text => "ok for '$params->{a}'" );
};
if(my $pid = fork()){
use Mojo::UserAgent;
my $t = Mojo::UserAgent->new;
#simulate front-end query
my $tx = $t->post('http://127.0.0.1:3042/' =>
{ 'Content-Type' => 'application/x-www-form-urlencoded; charset=UTF-8' },
form => { a => 'été'}
);
my $res = $tx->res->body;
say "result:\n", HexDump($res);
use Devel::Peek;
Dump( $res );
kill 'SIGKILL', $pid;
exit(0);
}
app->start(qw(daemon --listen http://*:3042 ));
The ouput of this script was:
perl v5.20.1, Mojolicious: v6.05, Page de codes active : 850
[Tue May 26 12:31:15 2015] [info] Listening at "http://*:3042"
Server available at http://127.0.0.1:3042
[Tue May 26 12:31:16 2015] [debug] Your secret passphrase needs to be changed
[Tue May 26 12:31:16 2015] [debug] POST "/"
[Tue May 26 12:31:16 2015] [debug] Routing to a callback
[Tue May 26 12:31:16 2015] [debug] received data:
00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF
00000000 E9 74 E9 .t.
SV = PVMG(0x5a7a198) at 0x4dce730
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x5b62c48 "\303\251t\303\251"\0 [UTF8 "\x{e9}t\x{e9}"]
CUR = 5
LEN = 10
[Tue May 26 12:31:16 2015] [debug] 200 OK (0.005052s, 197.941/s)
result:
00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF
00000000 6F 6B 20 66 6F 72 20 27 - C3 A9 74 C3 A9 27 ok for '..t..'
SV = PV(0x41a73e8) at 0x4927070
REFCNT = 1
FLAGS = (PADMY,POK,IsCOW,pPOK)
PV = 0x5aa1328 "ok for '\303\251t\303\251'"\0
CUR = 14
LEN = 16
COW_REFCNT = 1
So we can see that the server receive the "a"
parameter in an string flagged utf8
that contain the buffer "\x{e9}t\x{e9}"
.
I was expecting "été"
with the hexa "C3 A9 74 C3 A9"
.
What is wrong?