Try this out.
The example I have used was something I was doing in a test environment, you might need to change the code slightly.
I had a text file with the following data in:
test
café
áÁÁÁááá
žžœš¥±
ÆÆÖÖÖasØØ
ß
Then I had a form which took a file input in and performed the following code:
function neatify_files(&$files) {
$tmp = array();
for ($i = 0; $i < count($_FILES); $i++) {
for ($j = 0; $j < count($_FILES[array_keys($_FILES)[$i]]["name"]); $j++) {
$tmp[array_keys($_FILES)[$i]][$j]["name"] = $_FILES[array_keys($_FILES)[$i]]["name"][$j];
$tmp[array_keys($_FILES)[$i]][$j]["type"] = $_FILES[array_keys($_FILES)[$i]]["type"][$j];
$tmp[array_keys($_FILES)[$i]][$j]["tmp_name"] = $_FILES[array_keys($_FILES)[$i]]["tmp_name"][$j];
$tmp[array_keys($_FILES)[$i]][$j]["error"] = $_FILES[array_keys($_FILES)[$i]]["error"][$j];
$tmp[array_keys($_FILES)[$i]][$j]["size"] = $_FILES[array_keys($_FILES)[$i]]["size"][$j];
}
}
return $files = $tmp;
}
if (isset($_POST["submit"])) {
neatify_files($_FILES);
$file = $_FILES["file"][0];
$handle = fopen($file["tmp_name"], "r");
while ($line = fgets($handle)) {
$enc = mb_detect_encoding($line, "UTF-8", true);
if (strtolower($enc) != "utf-8") {
echo "<p>" . (iconv($enc, "UTF-8", $line)) . "</p>";
} else {
echo "<p>$line</p>";
}
}
}
?>
<form action="<?= $_SERVER["PHP_SELF"]; ?>" method="POST" enctype="multipart/form-data">
<input type="file" name="file[]" />
<input type="submit" name="submit" value="Submit" />
</form>
The function neatify_files
is something I wrote to make the $_FILES
array more logical in its layout.
The form is a standard form that simply POST
s the data to the server.
Note: Using $_SERVER["PHP_SELF"]
is a security risk, see here for more.
When the data is posted I store the file in a variable. Obviously, if you are using the multiple
attribute your code won't look quite like this.
$handle
stores the entire contents of the text file, in a read-only format; hence the "r"
argument.
$enc
uses the mb_detect_encoding
function to detect the encoding (duh).
At first I was having trouble with obtaining the correct encoding. Setting the encoding_list
to use only UTF-8, and setting strict
to be true.
If the encoding is UTF-8 then I simply print the line, if it didn't I converted it to UTF-8 using the iconv
function.