1

I'm using kakasi inverter to convert Kanji characters to Romaji like in the following

echo "日本が好きです。" | kakasi -i euc -Ha -Ka -Ja -Ea -ka -s -iutf8 -outf8
nippon ga suki desu .

KAKASI is a simple Kanji to Kana inverter based on dictionaries. Doing the opposite it's a pretty hard task, but there are methods to convert Kana to Kanji, before any conversion to Romanji. Kakasi can do Hiragana from Kanji, Japanese, etc inversions like

echo "7月31日" | iconv -f utf8 -t shift-jis | kakasi -JH -KH -Ea -s | iconv -f shift-jis -t utf8

7 がつ 31 にち

I have built this Dockerfile that comes with both kakasi and libskk, that it should be a kana kanji converter, but I cannot get rid of this. The Dockerfile comes with this collection of the SKK dictionaries as well already configured to be used like in the libskk examples here:

$ echo "A i SPC" | skk
{ "input": "A i SPC", "output": "", "preedit": "▼愛" }
$ echo "K a p a SPC K a SPC" | skk
{ "input": "K a p a SPC K a SPC", "output": "", "preedit": "▼かぱ【▼蚊】" }
$ echo "r k" | skk -r tutcode
{ "input": "r k", "output": "あ", "preedit": "" }
$ echo "a (usleep 50000) b (usleep 200000)" | skk -r nicola
{ "input": "a (usleep 50000) b (usleep 200000)", "output": "うへ", "preedit": "" }

I would like to use libskk to invert kana to kanji, eventually using kakasi as intermediate step programmatically. For Kakasi I have build a Node.js wrapper here, and I'm trying to do the same with SKK library.

The current docs is in japanese, while an example doing this is the mecab-skkserve, but I do not find and docs about how it works.

Here there are several attempts based on the few skk available examples

$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.L
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.S
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.M
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.M -r tutcode
{ "input": "n i p p o n g a s u k i d e s u", "output": "然諾毎度大団", "preedit": "u" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.M -r nicola
{ "input": "n i p p o n g a s u k i d e s u", "output": "めく,,つめせうしちきくてたし", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.L.unannotated 
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.itaiji
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
i$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.jinmei 
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.ML
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }
$ echo "n i p p o n g a s u k i d e s u" | skk -f ../dict/SKK-JISYO.hukugougo 
{ "input": "n i p p o n g a s u k i d e s u", "output": "にっぽんがすきです", "preedit": "" }

And this is a funny pipe between kakasi and skk passing data with jq:

echo "もし君を 許せたら" | kakasi -i euc -Ha -Ka -Ja -Ea -ka -s -iutf8 -outf8 | awk '{gsub(/./,"& ",$0);print}' | skk -f ../dict/SKK-JISYO.M -r tutcode | jq -r .output

where the result will be 養せ校循七野寮 while starting from もし君を 許せたら.

NOTES

For more info about kakasi see the official web site here.

loretoparisi
  • 15,724
  • 11
  • 102
  • 146

0 Answers0