0

Replace words in a Greek given text, for example with English words.

Here an example:

 var str= "Ενώ Gallant δεν λειτουργεί στην προσπάθεια να χτίσει πια μηχανές αποκωδικοποίησης, οι άλλοι."

function findword(){
  word = new RegExp("\\b(προσπάθεια)\\b","gi")
  var sust = str.replace(word,'effort');
}

It should return: "Ενώ Gallant δεν λειτουργεί στην effort να χτίσει πια μηχανές αποκωδικοποίησης, οι άλλοι.

Trying to do it in JavaScript I failed, but I have read that this is not possible since this language does not handle Unicode characters other than English. The only possibility I found is xregexp, but it seems that only would work to detect character classes and not individual words. ¿Is really impossible to make it work in JavaScript?

The Python 3 Documentation states that this language can handle unicode characters, but in this case it seems that it's necessary to write characters with the unicode code... With which languaje would it be possible to replace words in the simplest way as I wrote in the code? Python, Java, Perl ...?

theonlygusti
  • 11,032
  • 11
  • 64
  • 119
  • Related: [Javascript + Unicode regexes](http://stackoverflow.com/q/280712) – Martijn Pieters Jul 19 '14 at 14:01
  • JavaScript can handle all Unicode characters. It’s just so that it has very poor tools for that. In particular, in JavaScript regexps, only basic Latin letters are “word” characters, so `\b` won’t work here. The problem needs to be defined more exactly: what do you want to replace, under which conditions? And the answers are more complicated than you expect. – Jukka K. Korpela Jul 19 '14 at 14:32
  • Defining more exactly. – Αριάδνη Jul 19 '14 at 15:16
  • I would like to manage words non composed by basic latin characteres like greek words; that's to say to be able match them or to replace them. I've noticed that \b works very bad since it only recognize basic latin characters(as Jukka K. Korpela said). Anyhow there's no good solution for this gap, because is not possible to create an alternative elegant and working word boundary. But even when I tried this code it did not work at all, so it seems JS does not recognice this set of characters. What I was triying to ask is if there is any way to do it, even with a different language. Thanks – Αριάδνη Jul 19 '14 at 15:26

2 Answers2

0

This should do it:

'Ενώ Gallant δεν λειτουργεί στην προσπάθεια να χτίσει πια μηχανές αποκωδικοποίησης, οι άλλοι.'.replace( /(προσπάθεια)/g, 'effort' )

Edit

I think this does exactly what you want:

String.prototype.translate = function translate( greek, english ) {
  return this.replace( new RegExp( '(' + greek + ')' ), english );
}

var translatedString = 'Ενώ Gallant δεν λειτουργεί στην προσπάθεια να χτίσει πια μηχανές αποκωδικοποίησης, οι άλλοι.'.translate( 'προσπάθεια', 'effort' );
console.log( translatedString );
Bram
  • 95
  • 7
0

Perl has exceptional unicode handling. E.g. the following code:

use 5.016;
use warnings;
use utf8;
use open qw(:std :utf8);

my $str= "Ενώ Gallant δεν λειτουργεί στην προσπάθεια να χτίσει πια μηχανές αποκωδικοποίησης, οι άλλοι.";
$str =~ s/\bπροσπάθεια\b/effort/g;
say $str;

prints

Ενώ Gallant δεν λειτουργεί στην effort να χτίσει πια μηχανές αποκωδικοποίησης, οι άλλοι.
Slade
  • 1,364
  • 1
  • 9
  • 20
clt60
  • 62,119
  • 17
  • 107
  • 194