1

I am creating a web page on my CentOS server, where I want to traverse all my photos and videos and then show them on my page. However then it seems that when the files have our Danish national characters included, like æøåÆØÅ, then my exec command cannot access the file - and I need the exec command as I need mediainfo to show-and-tell the format and other details of the file (video, audio or image).

Let's assume I have this data and array that I am traversing (3 files):

$folder = "!My Folder"; // parent folder has a special char

Array
(
    [0] => Array
        (
            [filename] => !file #2.jpg
            [descr] => File with special chars, but no national chars

        )
    [1] => Array
        (
            [filename] => file with Danish æøå.jpg
            [descr] => File with Danish chars, but no special chars
        )
    [2] => Array
        (
            [filename] => file with no special.jpg
            [descr] => File with nothing special
        )
)

I am then reading the mediainfo info from PHP like this:

$param = escapeshellarg("$folder/$filename"); // escape file argument
exec("mediainfo $param", $outputArray); // store line-by-line output in an array

This works fine for file [0] and [2] (I get a populated array), but [1] just returns an empty array from the output:

Array
(
    [0] => 
)

As a note then I am able to use mediainfo directly on the server and doing this will work fine and return detailed data:

[usr@srv !My Folder]# mediainfo file\ with\ Danish\ æøå.jpg

So it seems to be the exec that has some problems with this?

I am using PHP 8.1 and I have no problem accessing or storing files on my server, via Samba, with these Danish characters.

An alternative solution would be to rename the files with these characters, but ideally I hope to avoid doing that as it is kind of "destructive" (messing with the original files).

Does anyone have a good idea how to access those files via exec?

### UPDATE 1 - BUT STILL NO SOLUTION ###

Just to make it crystal clear and to "prove" this is something related to exec then I refer to the answer from @CBroe below and adding one additional line to set the locale character encoding in PHP, setlocale + outputting the exec command:

setlocale(LC_CTYPE, "en_US.UTF-8");
$param = escapeshellarg("$folder/$filename");
exec("mediainfo $param", $outputArray);
echo "mediainfo $param";

The echo will output this (and an empty array):

mediainfo '/server/original/!My Folder/file with Danish æøå.jpg'

But if I run this exact same command directly on my server, mediainfo '/server/original/!My Folder/file with Danish æøå.jpg' then it will show me the media info for the file:

enter image description here

I also believe that this is some character encoding issue, but yet I do not know how to solve it ;-)

### UPDATE 2 - BUT STILL NO SOLUTION ###

As suggested below, then I also did try using putenv('LANG=en_US.UTF-8'), but for me that didn't help. I also tried using shell_exec() instead of exec() - same result and it did not help.

putenv("LANG=en_US.UTF-8");

$folder = "!My Folder"; // parent folder has a special char
$filename = "file with Danish æøå.jpg";

$param = escapeshellarg("$folder/$filename");
$output = shell_exec("mediainfo $param");
$outputArray = explode("\n",$output);
print_r($outputArray);

This will result in an array with two empty values:

Array
(
    [0] => 
    [1] => 
)
Beauvais
  • 2,149
  • 4
  • 28
  • 63

2 Answers2

0

Somewhat late, but i stumbled across this today (pretty much the same, filename from glob() function and working further). This is not a problem of exec() but of the environment that mediainfo uses.

If you use UTF-8 filenames, it is essential to set the LANG environment correctly.

putenv('LANG=en_US.UTF-8');

worked for me.

stoney
  • 101
  • 1
  • I tested this and it did not make any difference for me. Even I would state, that this is the exact same thing I already tried with `setlocale(LC_CTYPE, "en_US.UTF-8");` – Beauvais Jan 25 '23 at 14:56
  • `setlocale()` doesn't make a difference here, but `putenv()` does (i use `shell_exec()` or backticks to make `mediainfo` happy though). – stoney Jan 30 '23 at 15:39
-1

Testing echo escapeshellarg("file with Danish æøå.jpg"); on https://3v4l.org/FEWfI only gives me 'file with Danish .jpg' as result.

Checking the user comments for the function, there is https://www.php.net/manual/en/function.escapeshellarg.php#99213:

When escapeshellarg() was stripping my non-ASCII characters from a UTF-8 string, adding the following fixed the problem:

<?php
setlocale(LC_CTYPE, "en_US.UTF-8");
?>

That indeed appears to fix the problem, https://3v4l.org/1DfpF - result now is 'file with Danish æøå.jpg'

CBroe
  • 91,630
  • 14
  • 92
  • 150
  • `echo` may show the Danish chars but still `exec` returns an empty array when adding `setlocale(LC_CTYPE, "en_US.UTF-8");` just before I set the `$param` variable - so no fix as I see it, sorry :-) – Beauvais Aug 31 '22 at 08:57
  • So what do you get when you do an `echo "mediainfo $param"`? Does that look the same as the working version you enter directly via console? – CBroe Aug 31 '22 at 09:03
  • Yes, when echoing the command to the web page and copying it to the CentOS shell, then it works fine in the shell and it will show detailed info from `mediainfo` for the file with Danish chars. – Beauvais Aug 31 '22 at 09:08
  • Sounds like a problem with different character encodings in some place. Is your page using UTF-8? Where did you get the array of file data from, was that read directly from the file system? – CBroe Aug 31 '22 at 09:12
  • My web page is in UTF-8 and the files array is coming directly from a `scandir`. – Beauvais Aug 31 '22 at 09:14