One solution is to use a Unicode library, such as ICU-TC, which will do the work for you.
To avoid the library dependency, and convert yourself, you will want to read and convert from the variable-length UTF-8 encoding into 32 bit unsigned integers, and then convert the 32bit integers to UTF-16's variable length encoding of 16bit-values.
You will need to open your output file for binary writing, with:
FILE *outfile = fopen(filename,"wb");
UTF-16 may be written in either little or big endian ordering. To disambiguate, UTF-16 has a special byte-ordering code-point which you write first ( 0xFEFF
) The order these two bytes appear in the file tells the reader which endianness the file was written in. (see explanation in UTF-16 description on wikipedia) The code:
unsigned short int byte_ordering_sentinel = 0xFEFF;
fwrite(&byte_ordering_sentinel, 2, 1, outfile);
For each 32-bit integer, you will need to follow the UTF-16 rules to produce variable length UTF-16 values. For each 16-bit UTF-16 value, you would do:
fwrite(&next_utf16_value, 2, 1, outfile);
NOTE 1: Endianness is a product of your CPU and Operating System. Intel CPUs are always little endian. ARM CPUs can do either, and are little endian under Android. If you wish to change the endianness of the output, you need to byte-swap each 16-bit value before it's written. Be sure to also byte-swap the initial byte_ordering_sentinel.
On linux, you can byte swap efficiently using macros in byteswap.h.
NOTE 2: When using fgetc() it is important to check for an EOF value. There could be a race condition between your feof(arq) check and your fgetc() call, if someone changes the file while your program is running. Your loop could instead look like this:
while ( (num=fgetc(arq)) != EOF )