I'm writing a small web app that will receive and parse tab-delimited text files from users. Those files will either be uploaded via a textarea
or a multipart/form-data
form. Those files will be in a variety of charsets, including Asian and the like. In consequence I am trying to use utf-8 throughout the app.
The site is entirely (as far as I know) in UTF-8:
- Each php file is saved in utf-8 encoding;
- I have added
default_charset = "utf-8"
in myphp.ini
file; The HTML header contains the required utf-8 mentions:
header('Content-Type:text/html; charset=UTF-8'); ... <?xml version="1.0" encoding="utf-8" ?> ... <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
The
textarea
forms contain theaccept-charset="UTF-8"
tag.- The db is collated in utf-8;
- Each connection to the db includes the option
1002 => 'SET NAMES utf8'
.
Now, I just discovered that I needed to set mb_regex_encoding
to utf-8 manually for one of my parsing function to work (I use mb_split()
to identify & replace tabs and new lines). So ...
What else do I need to do to make sure my site is once and for all utf-8 throughout? In particular, are there any other encoding function I should set, such as mb_internal_encoding()
, and if so where in the code should I do that (e.g., at the start of the index.php
file?