In python NLTK, I want to get morphological analysis result on non-whitespace string

Question

I want to get a morphological analysis result from NLTK on a non-whitesapce string.

For example:

The string is "societynamebank".

I want to get ['society', 'name', 'bank']

How to get that result on NLTK ?

There is no such module in `NLTK`. Are you working with English text or german text? — alvas, Oct 21 '14 at 18:28
English test. I use under url. http://stackoverflow.com/questions/195010/how-can-i-split-multiple-joined-words — user1371662, Nov 04 '14 at 00:51

score 5 · Answer 1 · answered Nov 07 '14 at 13:41

Here is a simple code that may help you. It uses pyEnchant dictionary for morphological analysis:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> tokens=[]
>>> def tokenize(st):
...    if not st:return
...    for i in xrange(len(st),-1,-1):
...       if d.check(st[0:i]):
...          tokens.append(st[0:i])
...          st=st[i:]
...          tokenize(st)
...          break
... 
>>> tokenize("societynamebank")
>>> tokens
['society', 'name', 'bank']
>>> tokens=[]
>>> tokenize("HelloSirthereissomethingwrongwiththistext")
>>> tokens
['Hello', 'Sir', 'there', 'is', 'something', 'wrong', 'with', 'this', 'text']

In python NLTK, I want to get morphological analysis result on non-whitespace string

1 Answers1