1

I need to write out a file to disk with special ISO-8859-15 characters. For my own testing purposes, I used:

—©®±àáâãäåæÒÓÔÕÖ¼½¾§µçðþú–.jpg

...but the em-dash, en-dash, and the 1/2, 1/4, and 3/4 fractions were replaced with garbage when the file was written to disk with this name, while the other characters in the file name were written out correctly. Why some and not others???

Here is a very simple PHP script to write out a file with just copyright symbols and em-dashes in its name. When I run it, the string is written to the file correctly, but the filename's em-dashes are replaced with garbage:

<?php
    // First, create a text file with the em-dash and the copyright symbol, then put the file prefix into the file:
    $filename1 = "000—©—©.txt";
    $content1 = "000—©—©";
    file_put_contents($filename1, $content1);
?>

What is the most efficient and elegant way to do this using PHP (or Javascript)? I'm targeting the ISO-8859-15 character set ONLY.

Many thanks! Tom

Tom
  • 1,836
  • 4
  • 16
  • 30
  • Make sure the text editor you use to save the PHP file will save in your target character set or this stuff will instantly get turned into ascii garbage just by the act of saving the PHP code. – developerwjk Feb 12 '15 at 00:06
  • `–—¼½¾` characters do not exist in the ISO-8859-15 encoding. They do in the similar Windows code page 1252 encoding, and the fractions do in the similar ISO-8859-1 encoding. ISO-8859-15 is very rarely used. – bobince Feb 12 '15 at 11:10

1 Answers1

3

I've found my own answer. First, I need WINDOWS-1252 encoding, as it turns out. Second, all I need to do is use inconv(), converting from 'UTF-8' to 'WINDOWS-1252', like so:

<?php
    // First, create a text file with the em-dash and the copyright symbol, then put the file prefix into the file:
    $filename1 = "000—©—©.txt";
    $content1 = "000—©—©";

    // Judicious use of iconv() does the trick:
    $filename1 = iconv('UTF-8', 'WINDOWS-1252', $filename1);
    file_put_contents($filename1, $content1);
?>

My only lingering question, provided that I'm testing this on XAMPP on my local Windows machine, is whether WINDOWS-1252 encoding will work on actual servers at the major hosting services (GoDaddy, etc.) If not, is there a different encoding that supports everything included in WINDOWS-1252 but better suited for non-XAMPP localhost servers?

There's a complete listing of encodings supported by iconv here. Several are on the same line as WINDOWS-1252; does that mean they are interchangeable?

Many thanks, Tom

Tom
  • 1,836
  • 4
  • 16
  • 30
  • Code page 1252 is the “ANSI code page” (default for applications like PHP which use the MS C runtime for file access) in the Western Europe locale. In other regions you may get different results. On non-Windows servers the file naming scheme is based on bytes, so there's no inherent encoding, but typically modern Linux servers prefer UTF-8. So unfortunately there is no good answer cross-platform. Because this is so unreliable it is generally best to avoid putting non-ASCII in filenames. – bobince Feb 12 '15 at 11:04