I want to add stopwords -- like 'me', 'you' or something -- to MeCab. but I can't find any information of stopword on MeCab on its manual.
Asked
Active
Viewed 946 times
4
-
What do you mean by stopword? Mecab is a tokenizer and POS tagger, not a document classifier or information retrieval engine. What do you expect it to do about stop words? Or is this about adding entries to the user dictionary? – jogojapan Oct 26 '12 at 09:04
-
Stopwords are filter words to escape during text processing. The way to inject/use stop words varies depending on your task. What is your task and what is your purpose of the stop words? – alvas Jan 22 '13 at 08:55
-
1Have you solved this problem? I think that I am facing the same problem. If you have solved this, could you please add the solution below? – Boli-CS Mar 06 '14 at 14:04
2 Answers
0
MeCab is a part of speech tagger, it doesn't do stopword removal.
You need to remove stopwords yourself by processing output and looking at surface forms (the literal token), base forms (the lemmatized canonical form), or part of speech.

polm23
- 14,456
- 7
- 35
- 59
0
I think that you don't have to add stopword in MeCab. You can remove stopwords after mecab has passed you tokenized data by pattern matching(x.replace("stopword", "") in case of python) or using POS tag(remove terms with specific tags).

SUM
- 3
- 1