1

I have a PHP script that calls a python script with arguments, and returns several string with print from python.

If python returns unicode characters, they won't show up properly in the browser.
If PHP sends unicode characters to python through arguments, it breaks.

Python code:

# -*- coding: UTF8 -*-
from lxml import html
hparser = html.HTMLParser(encoding="utf-8")
tree = html.parse(url, hparser)
## stuff here..
a = a_el.text.encode("utf-8")
b = b_el.text.encode("utf-8")
print [a, b]

PHP code:

header("Content-type: text/html; charset=UTF-8");
exec('python script.py "unicode æøå"', $input_ar);

foreach ($input_ar as $value) {

    preg_match_all("/'([^']+)'/", $value, $value_ar);

    //A
    $a = $value_ar[1][0];
    //B
    $b = $value_ar[1][1];

    echo $a."<br>";
    echo $b."<br>";
}

The output in the browser either gives me \xc3\c85 something or a question mark in a black diamond shape.
I've tried using utf8_encode($string), but it didn't work.

I've also done a little research without luck.
I want to be able to send/recieve unicode characters from PHP to python.

theusual
  • 69
  • 1
  • 2
  • 9
  • all files must be UTF-8 also .. from your IDE try to change the File encoding to be UTF-8 – Mohammad Alabed Aug 20 '15 at 15:39
  • The black diamond with a question mark is a replacement character. It represents characters that are not present in the characterset. https://en.wikipedia.org/wiki/Specials_(Unicode_block) – Halcyon Aug 20 '15 at 15:41
  • @MohammadAlabed I just tried with a python script that only had `print "æøå"` in it, and the question marks came up in the browser. After adding `# -*- coding: UTF8 -*-` at the top of the .py file, it worked fine. I changed the coding on my original script, but still get `\xc3\x85`. If I do `.encode("utf-8")` in python, I get `\xc3\x85` in the browser. If i remove it I get `\xc5`. – theusual Aug 20 '15 at 15:51
  • You also need to make sure that the python executable's stdout stream is UTF-8. Check it with `sys.stdout.encoding` – theB Aug 20 '15 at 15:59
  • @theB PHP exec(python) returned `None` from `sys.stdout.encoding`. How to I change it? – theusual Aug 20 '15 at 16:13
  • In that case you probably want to refer to [this](http://stackoverflow.com/a/492711/5240004) – theB Aug 20 '15 at 16:54
  • @theB tried the example you linked, but that is just what I'm doing. Since there's only 3 unicode characters, I can temporary solve it by replacing them with `#1#`, and change them back to their correct character in both python and php. I know, it's ugly, but it'll work for now. – theusual Aug 20 '15 at 17:27

0 Answers0