Is there any way to use a tokenizer from either Python (such as MeCab or sudachi) or Java (like kuromoji) in Javascript? I am making a Chrome Extension that would give the top five most common words found in Japanese websites. I do not know a different way of parsing words from just characters. Right now I have been able to scrape only the Japanese text, so all I need to do is parse it now.
Asked
Active
Viewed 300 times
1 Answers
0
The quality is not great, but Chrome (and maybe other browsers) do have built in Japanese tokenization. See this question for details.

polm23
- 14,456
- 7
- 35
- 59