-1

Is there any way to use a tokenizer from either Python (such as MeCab or sudachi) or Java (like kuromoji) in Javascript? I am making a Chrome Extension that would give the top five most common words found in Japanese websites. I do not know a different way of parsing words from just characters. Right now I have been able to scrape only the Japanese text, so all I need to do is parse it now.

Steak
  • 514
  • 3
  • 15

1 Answers1

0

The quality is not great, but Chrome (and maybe other browsers) do have built in Japanese tokenization. See this question for details.

polm23
  • 14,456
  • 7
  • 35
  • 59