How to get NMecab to output romaji?

Question

I'm using a .NET port of Mecab (called NMecab) to try to parse Japanese Hiragana, Katakana, and Kanji to romaji.

Here's my code:

using NMeCab;    
MeCabTagger _tagger;

public string Parse(string input)
{
    _tagger = MeCabTagger.Create();
    _tagger.OutPutFormatType = "lattice";
    _tagger.LatticeLevel = MeCabLatticeLevel.Two;


    var output = _tagger.Parse(input);

    return output;
}

When I call Parse(input) using the following Japanese text: "ども"

I get the output: "ども助詞,接続助詞,,,,,ども,ドモ,ドモ EOS"

I'm looking for the romaji of "ども", which would be "domo."

I've tried to use Mecab directly as discussed in this SO answer, but get the same output.

Pierre · Accepted Answer · 2014-05-19T11:08:56.887

2

To my knowledge none of the dictionaries used by MeCab (IPA, Jumandic, or Unidic) includes romaji transcription of words. And actually there is no need for that:

There exist different transcription schemes (e.g. Hepburn, kunrei, 99 siki);
Information on the pronunciation of lexical units is already available (e.g. ドモ).

You have to write your own transcription routine... or look for an existing katakana-romaji transcription module (compatible with your transcription scheme)...

edited May 19 '14 at 11:08

answered May 19 '14 at 10:13

Pierre

1,204
8
15

1

Gotcha. Thanks... thought MeCab handled the romaji translation. Instead it looks like it simply converts kanji down to hiragana/katakana. Then I just roll my own hiragana/katakana conversion. – Chaddeus Jun 01 '14 at 22:34
1

Actually the hiragana/katakana transcription is part of the dictionary... you can have a look at the IPA dictionary source files (*.csv). – Pierre Jun 02 '14 at 09:47

How to get NMecab to output romaji?

1 Answers1