2

I'm trying to read one text file (foo1.txt), remove all the nltk defined stopwords and write in another file (foo2.txt). Code is as following: Require import: from nltk.corpus import stopwords

def stop_words_removal(): 
    with open("foo1.txt") as f:
            reading_file_line = f.readlines() #entire content, return  list 
            #print reading_file_line #list
            reading_file_info = [item.rstrip('\n') for item in reading_file_line]
            #print reading_file_info #List and strip \n
            #print ' '.join(reading_file_info)
            '''-----------------------------------------'''
            #Filtering & converting to lower letter
            for i in reading_file_info:
                words_filtered = [e.lower() for e in i.split() if len(e) >= 4]                
                print words_filtered

            '''-----------------------------------------'''
            '''removing the strop words from the file'''
            word_list = words_filtered[:] 
            #print word_list
            for word in words_filtered:
                        if word in nltk.corpus.stopwords.words('english'): 
                            print word
                            print word_list.remove(word)

            '''-----------------------------------------'''
            '''write the output in a file'''
            z = ' '.join(words_filtered)
            out_file = open("foo2.txt", "w")
            out_file.write(z)
            out_file.close()  

The problem is the 2nd part of the code "removing the strop words from the file" does not work. Any suggestion will be greatly appreciated. Thanks.

Example Input File: 
'I a Love this car there', 'positive',
'This a view is amazing there', 'positive',
'He is my best friend there', 'negative'

Example Output:
['love', "car',", "'positive',"]
['view', "amazing',", "'positive',"]
['best', "friend',", "'negative'"]

I tried as it suggested in this link, but none of them work

Community
  • 1
  • 1
J4cK
  • 30,459
  • 8
  • 42
  • 54
  • Are you sure this is the output you want? Do you need the punctuation signs? – elyase May 17 '13 at 16:42
  • 3
    @elyase Thanks for reply. Actually I dont need square brackets but I need clear separation of each line. The following code you posted is work only for the last line of the file. I want to remove the stop words in each line of the text. – J4cK May 17 '13 at 16:56
  • 2
    @elyase, Thanks mate. The following code you wrote work like a charm. As you mention I just imported future and string, since I'm using python 2.7. Thanks again :) – J4cK May 17 '13 at 20:34

1 Answers1

3

This is what I would do, inside your function:

with open('input.txt','r') as inFile, open('output.txt','w') as outFile:
    for line in inFile:
        print(''.join([word for word in line.lower().translate(None, string.punctuation).split() 
              if len(word) >=4 and word not in stopwords.words('english')]), file=outFile)

Dont forget to add:

from __future__ import print_function                   

if you are on Python 2.x.

elyase
  • 39,479
  • 12
  • 112
  • 119