-2

I want to ask how to break a sentence into a few words, what this is using of NLP (Natural Language Processing) in python called NLTK or PARSER ? on python i confused with the method, what method should i take in my case.

Def Putra
  • 7
  • 9

2 Answers2

1

If you want to find all words the sentence contains, i.e. tokenization, then use NLTK:

tokens = nltk.word_tokenize(sentence)

Note that simple split by whitespaces sentence.split() works worse.

In particular, 'This quickly comes into problems when an abbreviation is processed. “etc.” would be interpreted as a sentence terminator, and “U.N.E.S.C.O.” would be interpreted as six individual sentences, when both should be treated as single word tokens. How should hyphens be interpreted? What about speech marks and apostrophes?'

Or take a look at another source: "you chop on whitespace and throw away punctuation characters. This is a starting point, but even for English there are a number of tricky cases. For example, what do you do about the various uses of the apostrophe for possession and contractions?

Mr. O'Neill thinks that the boys' stories about Chile's capital aren't amusing.

A simple strategy is to just split on all non-alphanumeric characters, but while o neill looks okay, aren t looks intuitively bad."

Nikita Astrakhantsev
  • 4,701
  • 1
  • 15
  • 26
-2

Without using Natural Language Toolkit(NLTK) you may use simple Python command as follows.

>>> line="a sentence with a few words"
>>> line.split()
['a', 'sentence', 'with', 'a', 'few', 'words']
>>>

given in How to split a string into a list?

Community
  • 1
  • 1