2

Prerequisites: hunspell and php5.

Test code from bash:

user@host ~/ $ echo 'sagadījās' | hunspell -d lv_LV,en_US
Hunspell 1.2.14
+ sagadīties

- works properly.

Test code (test.php):

$encoding = "lv_LV.utf-8";

setlocale(LC_CTYPE, $encoding); // test
putenv('LANG='.$encoding); // and another test

$raw_response = shell_exec("LANG=$encoding; echo 'sagadījās' | hunspell -d lv_LV,en_US");

echo $raw_response;

returns

Hunspell 1.2.14
& sagad 5 0: tagad, sagad?ties, sagaudo, sagand?, sagar?o
*
*

Screenshot (could not post code with invalid characters): Hunspell php invalid characters

It seems that shell_exec cannot handle utf-8 correctly, or maybe some additional encoding/decoding is needed?

EDIT: I had to use en_US.utf-8 to get valid data.

Shady Medic
  • 111
  • 11
Kristaps Karlsons
  • 482
  • 1
  • 7
  • 22
  • Have you tried [`proc_open()`](http://php.net/manual/en/function.proc-open.php)? Seems to me like writing the data directly to the process' STDIN would be more reliable than bouncing it through the shell... – DaveRandom Apr 05 '12 at 13:01
  • 1
    @DaveRandom same output. But I just checked - mb_detect_encoding(stream_get_contents($pipes[1])) returns ASCII. That could be the problem. – Kristaps Karlsons Apr 05 '12 at 13:14

1 Answers1

5

Try this code:

<?php

  // The word we are checking
  $subject = 'sagadījās';

  // We want file pointers for all 3 std streams
  $descriptors = array (
    0 => array("pipe", "r"),  // STDIN
    1 => array("pipe", "w"),  // STDOUT
    2 => array("pipe", "w")   // STDERR
  );

  // An environment variable
  $env = array(
    'LANG' => 'lv_LV.utf-8'
  );

  // Try and start the process
  if (!is_resource($process = proc_open('hunspell -d lv_LV,en_US', $descriptors, $pipes, NULL, $env))) {
    die("Could not start Hunspell!");
  }

  // Put pipes into sensibly named variables
  $stdIn = &$pipes[0];
  $stdOut = &$pipes[1];
  $stdErr = &$pipes[2];
  unset($pipes);

  // Write the data to the process and close the pipe
  fwrite($stdIn, $subject);
  fclose($stdIn);

  // Display raw output
  echo "STDOUT:\n";
  while (!feof($stdOut)) echo fgets($stdOut);
  fclose($stdOut);

  // Display raw errors
  echo "\n\nSTDERR:\n";
  while (!feof($stdErr)) echo fgets($stdErr);
  fclose($stdErr);

  // Close the process pointer
  proc_close($process);

?>

Don't forget to verify that the encoding of the file (and therefore the encoding of the data you are passing) actually is UTF-8 ;-)

klaus triendl
  • 1,237
  • 14
  • 25
DaveRandom
  • 87,921
  • 11
  • 154
  • 174
  • 2
    Thanks for feedback. `mb_detect_encoding` randomly (per char/word) returned ASCII and utf-8. After a while I tried to set LANG variable to en_US.utf-8 and it worked. Thanks! – Kristaps Karlsons Apr 05 '12 at 13:33