filter object at python 3.X

Question

In python 3.X, I've been writing these codes :

One function for "text_tokenizing" and another one for "remove extra characters". In "remove_characters_after_tokenization" function I used "filter".

My problem: When I run my project I see this line in console :

<filter object at 0x00000277AA20DE48> <filter object at 0x00000277AA44D160> <filter object at 0x00000277AA44D470>

How can I solve this issue?

This is my project's code :

import nltk
import re
import string
from pprint import pprint

corpus = ["The brown fox wasn't that quick and he couldn't win the race",
          "Hey that's a great deal! I just bought a phone for $199",
          "@@You'll (learn) a **lot** in the book. Python is an amazing language !@@"]



# Declare a function for "Tokenizing Text"
def tokenize_text(text):
    sentences = nltk.sent_tokenize(text)
    word_tokens = [ nltk.word_tokenize(sentence) for sentence in sentences]
    return word_tokens

# Declare a function for "Removing Special Characters"
def remove_characters_after_tokenization(tokens):
    pattern = re.compile('[{}]'.format(re.escape(string.punctuation)))
    filtered_tokens = list(filter(None, [pattern.sub('', token) for token in tokens]))
    return filtered_tokens


token_list = [tokenize_text(text) for text in corpus]
pprint(token_list)

filtered_list_1 = list(filter(None,[remove_characters_after_tokenization(tokens)
                                for tokens in sentence_tokens])
                   for sentence_tokens in token_list)

print(type(filtered_list_1))
print(len(filtered_list_1))
print(filtered_list_1)

@JETM, Thanks for your reply, but how can I check this duplication if exists? — brelian, Jan 26 '17 at 13:39
The culprit is the line where you create filtered_list_1. This line iterates over token_list, not over the iterator returned by filter, so the filter iterator is left uniterated. — EvertW, Jan 26 '17 at 13:42
@brelian Sorry, I'm not sure what you're asking? That's a comment SO auto-inserts when someone flags as a duplicate. It basically means I thought that person had the same issue you do. — , Jan 26 '17 at 13:51
@JETM Okay, No problem. Actually I had to create a filter for each sentence_tokens in token_list. — brelian, Jan 26 '17 at 13:53

Elisha · Accepted Answer · 2017-01-26T13:47:02.337

1

The following line creates a filter for each sentence_tokens in token_list:

filtered_list_1 = list(filter(None, [remove_characters_after_tokenization(tokens)
                                for tokens in sentence_tokens])
                   for sentence_tokens in token_list)

Perhaps you wanted to create a list of lists:

filtered_list_1 = list(filter(None, ([remove_characters_after_tokenization(tokens)
                                      for tokens in sentence_tokens]
                                     for sentence_tokens in token_list)))

edited Jan 26 '17 at 13:47

answered Jan 26 '17 at 13:36

Elisha

23,310
6
60
75

It works. Thank's for your help. Exactly I had to filter domain. ;) – brelian Jan 26 '17 at 13:51

filter object at python 3.X

1 Answers1