Currently I have a program, that is trying to mimic the functionality of the (linux) file command. I parse a .txt file with some characters, and interpret it to its respective interpretation. However, I struggle in differentiating file, when it comes to ISO8859-1 (latin 1). As it converts ISO8859-1 characters as UTF-8 encodings instead (for instance the æ = e6, is encoded as c3 b8 instead?).
When I make and pass this .txt into file:
printf "æøå" > test.txt
file test.txt
it returns simply:
UTF-8 Unicode text, with no line terminators.
* od -c -tx1 test.txt
: returns *
0000000 303 246 303 270 303 245
c3 a6 c3 b8 c3 a5
0000006
Can anyone explain to me why this is the case, as the 'æøå' prefix is contained within ISO8859-1 encoding, but is then interpreted as a UTF8 encoding instead?