1

I got a csv file, if I set the charset to ISO-8859-2(eastern europe) in Libre Calc, than it renders the characters correctly, but since the server's locale set to EN-UK.

I can not read the characters correctly, for example: it returns : T�t insted of Tót.

I tried many things like:

echo (mb_detect_encoding("T�t","ISO-8859-2","UTF-8"));

I know probably the char does not exist in UTF-8 but I tried.

Also tried to setup the correct charset in the header:

header('Content-Type: text/html; charset=iso-8859-2');
echo "T�th";

but its returns : TÄĹźËth insted of Tóth.

Please help me solve this, thanks in advance

Rohit Gupta
  • 4,022
  • 20
  • 31
  • 41
Andrewboy
  • 364
  • 5
  • 15
  • Maybe change locale of server, https://www.php.net/manual/en/function.setlocale.php... but why not just use utf8 everywhere? – user3783243 Sep 04 '22 at 22:33
  • trying to add: setlocale(LC_ALL, 'hu_HU'); but does not helped – Andrewboy Sep 04 '22 at 22:49
  • 1
    I created a CSV, added a row with those accute o accented characters, did an `fgetcsv`, and by default doesn't display correctly, but adding the content-type `header` with character set and it displayed correctly. Also, don't rely on just echoing out a string as that can get dicey depending on program you are using (and most editors default to UTF-8 anyway). Also make sure you're exporting from Libre with the correct character set, not just setting it's default character set. – Jim Sep 05 '22 at 00:26
  • 2
    When you use `header('Content-Type: text/html; charset=iso-8859-2');` and look at the Browser's response header, what is the character set? – Misunderstood Sep 05 '22 at 02:16
  • well intalling a text encoding plugin into firefox, and looked for every possible charset's does not return the desired value, when utf-8 is selected the return text is T�th and with every other values return something else than the question mark in rectengular, but none of them is the desired character – Andrewboy Sep 05 '22 at 09:44
  • PHP doesn't know or care about the encoding of your strings. Some individual functions do, but in that case they often have an optional `$encoding` argument where you can set `ISO-8859-2` if needed. And `mb_detect_encoding()` doesn't really do what its name suggest. Please edit the question and show real code. In particular, are you manipulating the file contents in any way? What output encoding do you need? – Álvaro González Sep 05 '22 at 14:17
  • @Jim - why don't you make that an answer? – Rohit Gupta Sep 06 '22 at 00:04

1 Answers1

0

I advise against setting the header to charset=iso-8859-2'. It is usual to work with UTF-8. If the data is available with a different encoding, it should be converted to UTF-8 and then processed as CSV. The following example code could be kept as simple as the newline characters in UTF-8 and iso-8859-2 are the same.

$fileName = "yourpath/Iso8859_2.csv";
$fp = fopen($fileName,"r");
while($row = fgets($fp)){
  $strUtf8 = mb_convert_encoding($row,'UTF-8','ISO-8859-2');
  $arr = str_getcsv($strUtf8);
  var_dump($arr);
}
fclose($fp);

The exact encoding of the CSV file must be known. mb_detect_encoding is not suitable for determining the encoding of a file.

jspit
  • 7,276
  • 1
  • 9
  • 17