Getting at 0x1193417d8> as output

Question

#Reading files with txt extension
def get_sentences():
    for root, dirs, files in os.walk("/Users/Documents/test1"):
        for file in files:
            if file.endswith(".txt"):
                x_ = codecs.open(os.path.join(root,file),"r", "utf-8-sig")
                for lines in x_.readlines():
                    yield lines
formoreprocessing = get_sentences()

#Tokenizing sentences of the text files

from nltk.tokenize import sent_tokenize
for i in formoreprocessing:
    raw_docs = sent_tokenize(i)
    tokenized_docs = [sent_tokenize(i) for sent in raw_docs]

'''Removing Stop Words'''
stopword_removed_sentences = []
from nltk.corpus import stopwords
stopset = set(stopwords.words("English"))
def strip_stopwords(sentence):
    return ' '.join(word for word in sentence.split() if word not in stopset)
stopword_removed_sentences = (strip_stopwords(sentence) for sentence in raw_docs)
print(stopword_removed_sentences)

The above code is not printing what it is supposed to be. Instead it is throwing: at 0x1193417d8> as output. What is the mistake here? I am using python 3.5.

"not printing what it is supposed to be." **what do you expect it to print instead**? If you expect a list then use [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) instead of a [generator expression](https://docs.python.org/3.5/reference/expressions.html#generator-expressions), if you expect something else please specify what you expect. — Tadhg McDonald-Jensen, Jun 19 '16 at 18:00
Possible duplicate of [Generator Expressions vs. List Comprehension](http://stackoverflow.com/questions/47789/generator-expressions-vs-list-comprehension) — Tadhg McDonald-Jensen, Jun 19 '16 at 18:01
It is supposed to print the text that it receive as input from tokenized_docs, after removing stoo words from it. But the expression print(stopword_removed_sentences) is not doing so. — An student, Jun 20 '16 at 01:11
have you tried using list comprehension to define `stopword_removed_sentences` with `[]` around the expression instead of `()` or `print(list(stopwod_removed_sentences))` as JohnColeman has suggested? — Tadhg McDonald-Jensen, Jun 20 '16 at 03:22

score 2 · Answer 1 · answered Jun 19 '16 at 17:48

2

Try print(list(stopword_removed_sentences)). This turns the generator into a list before printing it

answered Jun 19 '16 at 17:48

John Coleman

51,337
7
54
119

score 0 · Accepted Answer · answered Jun 21 '16 at 05:06

This is the final answer, the provides the best result resolving the problem that i have mentioned in my previous comment.

from nltk.tokenize import sent_tokenize
raw_docs = sent_tokenize(''.join(formoreprocessing))
#print(raw_docs)
tokenized_docs = [sent_tokenize(''.join(formoreprocessing)) for sent in raw_docs]
#Removing Stop Words
stopword_removed_sentences = []
from nltk.corpus import stopwords
stopset = set(stopwords.words("English"))
def strip_stopwords(sentence):
    return ' '.join(word for word in sentence.split() if word not in stopset)
stopword_removed_sentences = (strip_stopwords(sentence) for sentence in raw_docs)
print(list(stopword_removed_sentences))

Getting at 0x1193417d8> as output

2 Answers2