3

Im having a XML feed with english characters which i need to translate to my language. Problem is its not transtlating exact strings but every similiar word.

Is there any way to translate only full strings and not everything inside words?

Example:

$string = "Red Cell is very good. Condition is new. But nobody buys it.";
$words = ["Red Cell", "Condition", "no", "Red", "new"];
$translations = ["Red Cell", "Stav", "ne", "Červený", "nový"];

$string = str_replace($words, $translations, $string);

What i get:

Červený Cell is very good. Stav is nevý. But nebody buys it.


What i want:

Red Cell is very good. Stav is nový. But nobody buys it.


Is there any way to translate exact strings and not everything that contains that words?

1 Answers1

3

The idea is to build an associative array ($pairs) with the words as keys and the translations as values, and then to build a search pattern with all words in an alternation:

$string = "Red Cell is very good. Condition is new. But nobody buys it.";
$words = ["Red Cell", "Condition", "no", "Red", "new"];
$translations = ["Red Cell", "Stav", "ne", "Červený", "nový"];

$pairs = array_combine($words, $translations);
krsort($pairs);

$pattern = '~\b(?:' . implode('|', array_keys($pairs)) . ')\b~u';

$result = preg_replace_callback($pattern, function ($m) use ($pairs) {
    return $pairs[$m[0]];
}, $string);

echo $result;

demo

To ensure that the longest string is tested first (between for example "Red Cell" and "Red"), the words in the pattern are sorted in a reverse order.

The advantage of preg_replace_callback with a single pattern and replacement parameters over str_replace with arrays, is that the string is processed only once when str_replace will parse the entire string once per word (it prevents circular replacements). Also, since the search parameter is a regex pattern, you can use word-boundaries to be sure that a word isn't cut in the middle.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Thanks for your help and answer. Anyway.. i found a problem and i am not able to solve it. Ill be happy for any hint. When it comes to translate regular sentences like "eBook reader batteries", there is no problem. But now I came across translating "Charger | AC Adapter for" which spits string by "|" as OR logic. I need regex pattern to ignore pipe and take whole string. I tried to put "\" before pipe. But it doesnt work. I also tried this "Charger [\|] AC Adapter for" but its giving me "Notice: Undefined index: in C:\xampp\htdocs\test\robbie.php on line 147" error. Thanks for any help – Sophia Tibbers Jul 11 '19 at 07:41
  • 1
    @SophiaTibbers: when you build the pattern, change `array_keys($pairs)` to `array_map(function ($i) {return preg_quote($i, '~');}, array_keys($pairs))`. `preg_quote` will escape regex special characters like the pipe. – Casimir et Hippolyte Jul 11 '19 at 08:29
  • 1
    Do not escape these characters in the `$words` array, because the matched strings are not escaped. – Casimir et Hippolyte Jul 11 '19 at 08:36
  • You are the best sir. Thanks alot. – Sophia Tibbers Jul 11 '19 at 09:50