In bash, I can get the hexdump of the string hello
as UTF-16 by doing the following:
$ echo -n "hello" | iconv -f ascii -t utf-16 | hexdump
0000000 feff 0068 0065 006c 006c 006f
000000c
I can also write a short C program like so:
int main(int argc, char **argv) {
char *str = argv[1];
hexDump("The string", str, 12);
return 0;
}
using the hexDump
routine from how to get hexdump of a structure data. 12
is the number of bytes I counted from the use of hexdump
above.
Compile and run:
$ gcc test.c -o test
$ ./test $(echo -n hello | iconv -f ascii -t utf-16)
The string:
0000 ff fe 68 65 6c 6c 6f 00 53 53 48 5f ..hello.SSH_
Why is there a difference between the first hexstring feff 0068 0065 006c 006c 006f
and the second hexstring ff fe 68 65 6c 6c 6f 00 53 53 48 5f
?
I am asking this because I am trying to debug an application that uses libiconv
to convert a UTF-16 string to UTF-8 and keep getting an errno
of EILSEQ
which means that libiconv
has come across an "invalid multibyte sequence."
UPDATE:
If I run hexdump
with -C
, I get the following output:
$ echo -n hello | iconv -f ascii -t utf-16 | hexdump -C
00000000 ff fe 68 00 65 00 6c 00 6c 00 6f 00 |..h.e.l.l.o.|
0000000c
This hexstring is still different from the one my C program produces in that it includes the \x00
bytes interspersed between the ascii characters. When I run the C program however, there are no \x00
bytes interspersed at all. It just has the ff fe
header and then the regular ascii characters.