Questions tagged [mecab]

Mecab is an open-source tokenizer and morphological analyser for Japanese, implemented in C++.

Mecab uses a probabilistic approach to split Japanese input into tokens (morphemes or words, depending on the underlying dictionary). It also performs POS (part-of-speech) tagging.

Project page: https://github.com/taku910/mecab

67 questions
10
votes
1 answer

what is the MeCab output and the tagset?

Can someone enlighten me on the MeCab default output? what annotation does the MeCab output and where can i find the tagset for the morpho analyzer http://mecab.sourceforge.net/ can anyone decipher this output from MeCab? ブギス・ジャンクション ブギス・ジャンクション…
alvas
  • 115,346
  • 109
  • 446
  • 738
5
votes
1 answer

Implementing an older Mecab library into a modern iOS app

I'm trying to use Mecab within a new app that I've been working on but I'm having trouble getting the library to work correctly. Originally, I tried the following repository which is supposed to be compatible with iOS…
lxmmxl56
  • 492
  • 4
  • 10
5
votes
0 answers

Can MeCab be configured / enhanced to give me the reading of English words too?

If I begin with a wholly Japanese sentence and run it through MeCab, I get something like this: $ echo "吾輩は猫である" | mecab 吾輩 名詞,代名詞,一般,*,*,*,吾輩,ワガハイ,ワガハイ は 助詞,係助詞,*,*,*,*,は,ハ,ワ 猫 名詞,一般,*,*,*,*,猫,ネコ,ネコ で 助動詞,*,*,*,特殊・ダ,連用形,だ,デ,デ ある…
Hakanai
  • 12,010
  • 10
  • 62
  • 132
4
votes
1 answer

Add new words to the fugashi dictionary

I'm using fugashi to extract words from sentences. How do I add new terms that are not in the fugacy dictionary to the dictionary? For example, YouTube is divided into "You" and "Tube." import fugashi tagger = fugashi.Tagger() nodes =…
Penguin_.
  • 71
  • 1
  • 2
4
votes
1 answer

MeCab: “ImportError: DLL load failed: The specified module could not be found.”

This is the first time I use python... I use win 10 + py38. I used "pip mecab-python3" and successfully installed mecab-python3-1.0.1,but there is something wrong. And the question as bellow: import…
koubunnkei
  • 51
  • 5
4
votes
2 answers

How to install mecab-python3 on mac OS using pip

I'm trying to install mecab-python3 by pip install mecab-python3, but got the following error. Collecting mecab-python3 Using cached…
Daisuke SHIBATO
  • 983
  • 2
  • 11
  • 23
4
votes
3 answers

Config file not found in github package

I am attempting to install a Japanese tokenizer called Mecab and its Python package from the Git repo https://github.com/mcho421/noj/blob/master/installing-mecab-python.md Downloading mecab itself works fine but when you hit the download…
Yoshi
  • 41
  • 3
4
votes
2 answers

How can i add stopwords to MeCab?

I want to add stopwords -- like 'me', 'you' or something -- to MeCab. but I can't find any information of stopword on MeCab on its manual.
3
votes
1 answer

Trying to get libmecab.dll (MeCab) to work with C#

I'm trying to use the Japanese morphological analyzer MeCab in a C# program (Visual Studio 2010 Express, Windows 7), and something's going wrong with the encoding. If my input (pasted into a textbox) is…
snarp
  • 33
  • 4
3
votes
0 answers

Can't figure out output character encoding for MeCab

I'm trying to parse some Japanese text, and I can't seem to figure out the output encoding. This is the output I'm getting: これは ̾��,����,*,*,*,*,* 本 ̾��,����,*,*,*,*,* です ̾��,����,*,*,*,*,* 。 ̾��,������³,*,*,*,*,* EOS Steps I took: git clone…
e-e
  • 1,071
  • 1
  • 11
  • 20
3
votes
2 answers

Issues installing mecab-python3 using pip

Today I've been attempting-- and failing-- to install this guy (MeCab library for Python 3.5+) for the sake of building a simple personalized Japanese readability analysis tool (as a learner of the language and data nerd). Of course, the first thing…
3
votes
1 answer

Is there a description of the mecab (Japanese word parser) algorithm?

Is there a document somewhere that describes the Mecab algorithm? Or could someone give a simple one-paragraph or one-page description? I'm finding it too hard to understand the existing code, and what the databases contain. I need this…
jtsoftware
  • 521
  • 3
  • 14
3
votes
1 answer

MeCab Not Parsing Correctly

I downloaded MeCab to parse some Japanese text. To test it out, I tried doing what some examples online showed. For example, I followed this guy's tips verbatim: http://www.robfahey.co.uk/blog/japanese-text-analysis-in-python/ The code is as…
user10724070
3
votes
2 answers

Pandas Series.apply doesn't work consist of strings

It's seems possible to relate with Japanese Language problem, So I asked in Japanese StackOverflow also. When I use string just object, it works fine. I tried to encode but I couldn't find the reason of this error. Could you please give me…
YOSUKE
  • 331
  • 3
  • 13
3
votes
1 answer

Why does 行ける parse into a single token, but 見られる parses into 2(見+られる)?

Both represent the same form of different types of verbs - shouldn't they both parse into a single token? Even if 2 tokens makes more sense, they should be consistent and both parse into 2 I would think. Edit: it was pointed out in comments that…
Rollie
  • 4,391
  • 3
  • 33
  • 55
1
2 3 4 5