It's seems possible to relate with Japanese Language problem, So I asked in Japanese StackOverflow also.
When I use string just object, it works fine.
I tried to encode but I couldn't find the reason of this error. Could you please give me advice?
MeCab is an open source text segmentation library for use with text written in the Japanese language originally developed by the Nara Institute of Science and Technology and currently maintained by Taku Kudou (工藤拓) as part of his work on the Google Japanese Input project. https://en.wikipedia.org/wiki/MeCab
sample.csv
0,今日も夜まで働きました。
1,オフィスには誰もいませんが、エラーと格闘中
2,デバッグばかりしていますが、どうにもなりません。
This is Pandas Python3 code
import pandas as pd
import MeCab
# https://en.wikipedia.org/wiki/MeCab
from tqdm import tqdm_notebook as tqdm
# This is working...
df = pd.read_csv('sample.csv', encoding='utf-8')
m = MeCab.Tagger ("-Ochasen")
text = "りんごを食べました、そして、みかんも食べました"
a = m.parse(text)
print(a)# working!
# But I want to use Pandas's Series
def extractKeyword(text):
"""Morphological analysis of text and returning a list of only nouns"""
tagger = MeCab.Tagger('-Ochasen')
node = tagger.parseToNode(text)
keywords = []
while node:
if node.feature.split(",")[0] == u"名詞": # this means noun
keywords.append(node.surface)
node = node.next
return keywords
aa = extractKeyword(text) #working!!
me = df.apply(lambda x: extractKeyword(x))
#TypeError: ("in method 'Tagger_parseToNode', argument 2 of type 'char const *'", 'occurred at index 0')
This is the trace error
りんご リンゴ りんご 名詞-一般
を ヲ を 助詞-格助詞-一般
食べ タベ 食べる 動詞-自立 一段 連用形
まし マシ ます 助動詞 特殊・マス 連用形
た タ た 助動詞 特殊・タ 基本形
、 、 、 記号-読点
そして ソシテ そして 接続詞
、 、 、 記号-読点
みかん ミカン みかん 名詞-一般
も モ も 助詞-係助詞
食べ タベ 食べる 動詞-自立 一段 連用形
まし マシ ます 助動詞 特殊・マス 連用形
た タ た 助動詞 特殊・タ 基本形
EOS
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-174-81a0d5d62dc4> in <module>()
32 aa = extractKeyword(text) #working!!
33
---> 34 me = df.apply(lambda x: extractKeyword(x))
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4260 f, axis,
4261 reduce=reduce,
-> 4262 ignore_failures=ignore_failures)
4263 else:
4264 return self._apply_broadcast(f, axis)
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4356 try:
4357 for i, v in enumerate(series_gen):
-> 4358 results[i] = func(v)
4359 keys.append(v.name)
4360 except Exception as e:
<ipython-input-174-81a0d5d62dc4> in <lambda>(x)
32 aa = extractKeyword(text) #working!!
33
---> 34 me = df.apply(lambda x: extractKeyword(x))
<ipython-input-174-81a0d5d62dc4> in extractKeyword(text)
20 """Morphological analysis of text and returning a list of only nouns"""
21 tagger = MeCab.Tagger('-Ochasen')
---> 22 node = tagger.parseToNode(text)
23 keywords = []
24 while node:
~/anaconda3/lib/python3.6/site-packages/MeCab.py in parseToNode(self, *args)
280 __repr__ = _swig_repr
281 def parse(self, *args): return _MeCab.Tagger_parse(self, *args)
--> 282 def parseToNode(self, *args): return _MeCab.Tagger_parseToNode(self, *args)
283 def parseNBest(self, *args): return _MeCab.Tagger_parseNBest(self, *args)
284 def parseNBestInit(self, *args): return _MeCab.Tagger_parseNBestInit(self, *args)
TypeError: ("in method 'Tagger_parseToNode', argument 2 of type 'char const *'", 'occurred at index 0')w