I want to work with data from CSV file, but I realized letters are not showing correctly. I tried million ways to convert the encoding but nothing works. Working on MacOS, PHP 7.4.4.
After executing fgets()
or fgetcsv()
on handle variable, I will get this (2 rows/lines in example).
Kód ADM;Kód obce;Název obce;Kód MOMC;Název MOMC;Kód MOP;Název MOP;Kód èásti obce;Název èásti obce;Kód ulice;Název ulice;Typ SO;Èíslo domovní;Èíslo orientaèní;Znak èísla orientaèního;PSÈ;Souøadnice Y;Souøadnice X;Platí Od
1234;1234;HorniDolni;;;;;1234;HorniDolni;;;è.p.;2;;;748790401;4799.98;15893971.21;2013-12-01T00:00:00
It is more or less correct czech language, but letter č
is superseded by è
and ř
is superseded by ø
, neither of them are part of czech alphabet. I am confident, there will be more of the misplaced letters in the file.
Executing file -I path/to/file
I receive file: text/plain; charset=iso-8859-1
which is sad, because as far as wiki is concerned, this charset doesn't have a czech alphabet included.
Neither of following commands didn't converted misplaced letters:
mb_convert_encoding($line, 'UTF-8', 'ISO8859-1')
iconv('ISO-8859-1', 'UTF-8', $line)
iconv('ISO8859-1', 'UTF-8', $line)
I have noticed that in ISO-8859-1 the ø
letter has a code 00F8
. Windows-1250 (which includes czech aplhabet) has correct letter ř
with code 0159
but both of them are preceded by 00F8
. Same with letter č
and è
which are both preceded by code 00E7
. I do not understand encoding very deeply, but it seems that file is encoded in Windows-1250 but the interpreter thinks the encoding is ISO-8859-1 and takes letter that is in place/code of original one.
But neither conversion (ISO-8859-1 => Windows-1250, ISO-8859-1 => UTF-8 or other way around) is working.
Does anyone has any idea how to solve this? Thanks!