4

Transliterator::listIDs() will list IDs, but apparently it's not a complete list.

In the example from this page, the ID looks like:

Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();

which is kind of weird, because IDs are supposed to be unique. This looks more like a rule, but it doesn't work if I pass it to the createFromRules method :)

Anyway, I'm trying to remove any punctuation from the string, except dash (-), or characters from a specific list.

Do you know if that's possible? Or is there some documentation that better explains the syntax for the transliterator ?

nice ass
  • 16,471
  • 7
  • 50
  • 89

2 Answers2

6

The ids that Transliterator::listIDs() are the "basic ids". The example you gave is a "compound id". You can see the ICU docs on this.

You can also create your own rules with Transliterator::createFromRules().

You can take a look at the prefefined rules:

<?php
$a = new ResourceBundle(NULL, sprintf('icudt%dl-translit', INTL_ICU_VERSION), true);

foreach ($a['RuleBasedTransliteratorIDs'] as $name => $v) {
    $file = @$v['file'];
    if (!$file) {
        $file = $v['internal'];
        echo $name, " (direction $file[direction]; internal)\n";
    } else { 
        echo $name, " (direction: $file[direction])\n";
        echo $file['resource'];
    }
    echo "\n--------------\n";
}

After formatting, the result looks like this.

chx
  • 11,270
  • 7
  • 55
  • 129
Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • 1
    friendly reminder: that's a pretty intense .txt file for machine low on memory, chrome and sublime text may stop responding handling it... – bitinn Dec 16 '13 at 07:25
1

Just in case someone wants a working example. The example mentioned (from the php manual) uses procedural style. To make it work with an object oriented style, use create() instead of createFromRules()

removePunctuation($string) {
    $transliterator = Transliterator::create("Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove;", \Transliterator::FORWARD);

    return $transliterator->transliterate($string);
}
Simon
  • 324
  • 1
  • 13