0

I have a text file with some content in it, which I want to display on my webpage. I'm loading the content this way:

$txt = file_get_contents('new.txt');

When I display the content with charset=UTF-8 it looks like this:

enter image description here

When I use charset=ISO-8859-1 instead it looks like this:

enter image description here

I want that the Text is looking like the second example, but while using charset=UTF-8 instead of ISO-8859-1. How can I convert the text so it's displayed right?

TmCrafz
  • 184
  • 1
  • 12
  • 1
    Possible duplicate of [UTF-8 all the way through](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Obsidian Age Oct 14 '19 at 03:57
  • If you're sure that all the source text files are in ISO-8859-1 encoding, you may simply use [iconv](https://www.php.net/manual/en/function.iconv.php) to convert them into UTF-8 before displaying. [This answer](https://stackoverflow.com/a/21376914/372172) suggests to use iconv through [stream_filter_append](https://www.php.net/manual/en/function.stream-filter-append.php), which is a neat trick to handle large files. – Koala Yeung Oct 14 '19 at 05:02
  • It is worth mentioned that you can simply [find](https://www.lifewire.com/uses-of-linux-command-find-2201100) all the text files to batch [iconv](https://www.geeksforgeeks.org/iconv-command-in-linux-with-examples/) them all. If you are going to have all UTF-8 text files in the future, this is the way to sanitize the old data and thus save your from converting when displaying. – Koala Yeung Oct 14 '19 at 05:04

1 Answers1

1

You can either work on the raw data or converting on the fly.

If you're planning to have all new data to stored in UTF-8 format, then batch converting all old data would be more favorable. It is not fun to have mixed encoding in your raw data. You may reference this question to find batch conversion command advice.

On the otherhand, if you're going to keep the input and storage in ISO-8859-1 encoding, the only thing you can do is to convert the document on the fly.

$txt = iconv('iso-8859-1', 'utf-8', file_get_contents('new.txt'));

Or if your source files have mixed encoding iso-8859-1 and other unknown encoding, you may add //IGNORE flag to prevent error:

$txt = iconv('iso-8859-1', 'utf-8//IGNORE', file_get_contents('new.txt'));

This takes more computation power to display the page every time. So it is always preferable to have the raw content converted (unless it is not possible for your situation).

Koala Yeung
  • 7,475
  • 3
  • 30
  • 50