0

In my PHP script I try to send utf8 characters to the google translate website for them to send me a translation of the text, but this doesn't work for UTF8 characters such as chinese, arabic and russian and I can't figure out why. If I try to translate 'как дела' to english I could use this link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=как дела

And it would return this: [[["how are you","как дела",,,1]],,"ru"]

A fine translation, exactly what I wanted, but if I try to recreate it in PHP I do this (I used bytes in the beginning because my future script will use bytes as starting point):

<?php
$bytes = array(1082,1072,1082,32,1076,1077,1083,1072); // bytes of: как дела
$str = "";

for($i = 0; $i < count($bytes); ++$i) {
    $str .= json_decode('"\u' . '0' . strtoupper(dechex($bytes[$i])) . '"'); // returns string: как дела
}

$from = 'ru';
$to = 'en';
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $str;
$call = fopen($url,"r");
$contents = fread($call,2048);

print $contents;
?>

And it outputs: [[["RєR RєRґRμR ° \"° F","какдела",,,0]],,"ru"]

The output doesn't make sense, it appears that my PHP script send the string 'какдела' to translate to english for me. I read something about making UTF-8 characters readable for google in a URI (or url). It says I should transfer my bytes to UTF-8 code units and put them in my url. I didn't yet figure out how to transfer bytes to UTF-8 code units, but I first wanted to try if it worked. I started by converting my text 'как дела' to code units (with percents for URL) to test it myself. This resulted in the following link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0

And when tested in browser it returns: [[["how are you","как дела",,,1]],,"ru"]

Again a fine translation, it appears it works so I tried to implement it in my script with the following code:

<?php
$from = 'ru';
$to = 'en';
$text = "%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0"; // code units of: как дела
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$call = fopen($url,"r");
$contents = fread($call,2048);

print $contents;
?>

This script outputs: [[["RєR Rє RґRμR ° \"° F","как дела",,,0]],,"ru"]

Again my script doesn't output what I want and what I get when I test these URL's in my own browser. I can't figure what I'm doing wrong and why google responds with a mess up of characters if I use the link in my PHP file.

Does someone know how to get the output I want? Thanks in advance!

Updated code to set strings in UTF-8, (not working)

I added a lot of settings at the top of the PHP file to make sure everything is in UTF8 format. Also I added a mb_convert_encoding halfway but the output keeps being wrong. The fopen function doesn't send the right UTF-8 string to google.

Output I get:

URL: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA%20%D0%B4%D0%B5%D0%BB%D0%B0
Encoding: ASCII
File contents: [[["RєR Rє RґRμR ° \"° F","как дела",,,0]],,"ru"]

Code I use:

<?php
header('Content-Type: text/html; charset=utf-8');
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');

$from = 'ru';
$to = 'en';
$text = rawurlencode('как дела');
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$url = mb_convert_encoding($url, "UTF-8", "ASCII");
$call = fopen($url,"r");
$contents = fread($call,2048);

print 'URL: ' . $url . '<br>';
print 'Encoding: ' . mb_detect_encoding($url) . '<br>';;
print 'File contents: ' . $contents;
?>
  • 2
    your PHP is not by default UTF-8 encoded. you need to manually set this in PHP code. [read about UTF-8](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through). You need to set [`mb_string`](http://www.php.net/manual/en/book.mbstring.php) attributes. – Martin Feb 18 '17 at 19:08
  • 1
    ***edit*** your question, don't post code in comments it's pretty unreadable. Cheers – Martin Feb 19 '17 at 15:42
  • I updated the opening post. –  Feb 19 '17 at 18:28

1 Answers1

0

Solved! I got the hint from another not from these forums to look at this stackoverflow post about setting a user agent. After some more research I found that this answer was the solution to my problem. Now everything works fine!

Community
  • 1
  • 1