Apologies for the nubeness of the question but I've been chasing my tail for days.
I need to create a function that can verify whether the encoding of a buffer being received is actually UTF8 and then do a basic regex to exclude unwanted control chars.
I started by recursively dumping:
0x62
0xCDBC
0xE0AC89
0xF09F8489
Into a test file.
It worked fine, copied the file and text editors from windows, Linux & mac can read it (and display the correct char's)
But when I try to read it back into my test function it dies, added a
char c = fs->get();
while(fs->good())
{
int len = sizeof(c);
printf("0x%X --- %i\n",c,len);
c = fs->get();
}
Yes I know the code sucks..
but what I don't understand is why I'm getting this on the output.
Hex sizeof()
0x26 --- 1
0xFFFFFFCD --- 1
0xFFFFFFBC --- 1
0xFFFFFFE0 --- 1
0xFFFFFFAC --- 1
0xFFFFFF89 --- 1
0xFFFFFFF0 --- 1
0xFFFFFF9F --- 1
0xFFFFFF84 --- 1
0xFFFFFFB9 --- 1
The 0x62 becomes a 0x26 whilst all the other numbers are correct but padded into a 64 bit pattern...?
locale is EN_en.utf8
I'm lost, any ideas out there?
Thanks in advance Bob.