We run a site where users upload image files. When these files are produced on Mac, sometimes they have UTF-8 characters in the file names (Since mac uses UTF-8 as its file system character set).
When PHP7 code receives these files, we have to store them in the local file system, which is Debian Linux, and does not support UTF-8.
Also, while PHP7 can support UTF-8, it does not do it natively or automatically.
So, the question is: what's the current best practice for handling this?
Thought 1:
Save the original name in the database (Collation = utf8mb4_unicode_ci
? ), and then store the images using a UUID on disk. Then, use the download=""
to have the file download as the original file name.
Pro: Seems to solve the problem.
Con: multibyte support seems to be kludge and clunky in PHP (even in 7.2.x+). Does this require a ton of checks in order to deal with it?
Thought 2:
Sanitize / filter out the UTF-8 characters from the file name to avoid the problem altogether.
Pro: I can use latin collation in MySQL / MariaDB like we always have AND I don't have to worry about the file system charsets.
Con: This is lossy. A file named touche'.pdf
will get renamed touch.pdf
OR I have to create some equivalency tables to turn e'
into e
.
Thought 3
I have over-thought this problem, or I am missing a simple solution.
What's the best way to deal with uploaded filenames that are UTF-8 / Multibyte?