I am quite a noob on Python and everything.
I am trying to use some NLTK for my dissertation on Applied Linguistics. But something keeps preventing the nltk tools to work on the dataset.
I've tried some codes in the copy+paste+modify style. But had no success. How should I prepare my dataset in order to apply nltk (such as, finding the percentage of punctuation for each sentence. Counting/eliminating stopwords, etc.). I've applied those features in another dataset, which are just texts, not enclosed into any of these "['']".
ds = {0: "['sentences I need to parse.']",
1: "['word1', 'word2', 'word3']",
2: "['sentences and words']",
3: "['Natural language processing.']",
4: "['Further tokenization is needed.']",
5: "['Is it a question?']",
6: "['You\'re a real noob.']"}
The output I am trying to obtain is:
sentences I need to parse
word1, word2, word3
sentences and words
Natural language processing.
Further tokenization is needed.
Is it a question?
You\'re a real noob.