On Linux server if user uploads a CSV file created in MS Office Excel (thus having Windows 1250 [or cp1250 or ASCII if you want to] encoding) all to me known methods of detecting the file encoding return incorrect ISO-8859-1 (or latin1 if you want to) encoding.
This is crucial for the encoding conversion to final UTF-8.
Methods I tried:
- cli
file -i [FILE]
returning iso-8859-1file -b [FILE]
returning iso-8859-1
- vim
vim [FILE]
and then:set fileencoding?
returning latin1
- PHP
mb_detect_encoding(file_get_contents($filename))
returning (surprisingly) UTF-8
while the file is indeed in WINDOWS-1250 (ASCII) as proves i.e. opening the CSV file in LibreOffice - Math asks for file encoding and selecting either of ISO-8859-1 or UTF-8 results in wrongly presented characters while selecting ASCII displays all characters correctly!
How to correctly detect the file encoding on Linux server (Ubuntu) (best if possible with default Ubuntu utilities or with PHP)?
The last option I can think of is to detect the user agent (and user OS) when uploading the file and it is windows then automatically assume the encoding is ASCII...