0

so I have a some code that opens a text file containing a list of paths to files like so:

C:/Users/User/Desktop/mini_mouse/1980

C:/Users/User/Desktop/mini_mouse/1982

C:/Users/User/Desktop/mini_mouse/1984

It then opens these files individually, line-by-line, and does some filtering to the files. I then want it to output the result to a completely different folder called:

output_location = 'C:/Users/User/Desktop/test2/'

As it stands, my code currently outputs the result to the place where the original file was opened i.e if it opens the file C:/Users/User/Desktop/mini_mouse/1980, the output will be in the same folder under the name '1980_filtered'. I, however, would like the output to go into the output_location. Could anyone see where I am going wrong currently? Any help would be greatly appreciated! Here is my code:

import os

def main():
stop_words_path = 'C:/Users/User/Desktop/NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:/Users/User/Desktop/test2/'

list_file = 'C:/Users/User/Desktop/list_of_files.txt'

with open(list_file, 'r') as f:
    for file_name in f:
        #print(file_name)
        if file_name.endswith('\n'):
            file_name = file_name[:-1]
        #print(file_name)
        file_path = os.path.join(file_name)  # joins the new path of the file to the current file in order to access the file

        filestring = ''  # file string which will take all the lines in the file and add them to itself
        with open(file_path, 'r') as f2:  # open the file
            print('just opened ' + file_name)
            print('\n')
            for line in f2:  # read file line by line
                
                x = remove_stop_words(line, stopwords)  # remove stop words from line
                filestring += x  # add newly filtered line to the file string
                filestring += '\n'  # Create new line
            
        new_file_path = os.path.join(output_location, file_name) + '_filtered'  # creates a new file of the file that is currenlty being filtered of stopwords
        with open(new_file_path, 'a') as output_file:  # opens output file
            output_file.write(filestring)


if __name__ == "__main__":
    main()
Community
  • 1
  • 1
Fordo
  • 61
  • 1
  • 9

2 Answers2

1

Assuming you're using Windows (because you have a normal Windows filesystem), you have to use backslashes in your pathnames. Note that this is only on Windows. I know it's annoying, so I changed it for you (you're welcome :)). You also have to use two backslashes, as it will try to use it as an escape char.

import os

def main():
stop_words_path = 'C:\\Users\\User\\Desktop\\NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:\\Users\\User\\Desktop\\test2\\'

list_file = 'C:\\Users\\User\\Desktop\\list_of_files.txt'

with open(list_file, 'r') as f:
    for file_name in f:
        #print(file_name)
        if file_name.endswith('\n'):
            file_name = file_name[:-1]
        #print(file_name)
        file_path = os.path.join(file_name)  # joins the new path of the file to the current file in order to access the file

        filestring = ''  # file string which will take all the lines in the file and add them to itself
        with open(file_path, 'r') as f2:  # open the file
            print('just opened ' + file_name)
            print('\n')
            for line in f2:  # read file line by line

                x = remove_stop_words(line, stopwords)  # remove stop words from line
                filestring += x  # add newly filtered line to the file string
                filestring += '\n'  # Create new line

        new_file_path = os.path.join(output_location, file_name) + '_filtered'  # creates a new file of the file that is currenlty being filtered of stopwords
        with open(new_file_path, 'a') as output_file:  # opens output file
            output_file.write(filestring)


if __name__ == "__main__":
    main()
Dan Lewis
  • 70
  • 1
  • 8
  • Thank you for your answer! Unfortunately, the result is the same :/ The output still goes to the mini_mouse file instead of the output_location. – Fordo Mar 05 '19 at 19:49
  • 1
    Ok, at a guess, I'd say that it was a problem with you not calling `file.close()` on any of the files, but I'm not entirely sure. Give it a shot and tell me if it works for you. – Dan Lewis Mar 05 '19 at 19:55
  • I dont think that is the problem as `with open()` does that automatically as far as I know. – Fordo Mar 05 '19 at 19:57
  • 1
    Ok. What does `_filtered` do? Does it add on to the end of the filename? – Dan Lewis Mar 05 '19 at 19:59
  • Yes, `_filtered` adds that onto the end of the filename to create something like `1980_filtered`. – Fordo Mar 05 '19 at 20:02
  • 1
    Ok, you'll need to add '.txt' onto the end of that, along with all the other filenames in your program, as Python needs an extension. Try adding '.txt' onto all the file paths referenced. – Dan Lewis Mar 05 '19 at 20:06
  • Does that include the `ouput_location` ? Like so: `output_location = 'C:\\Users\\User\\Desktop\\test2.txt\\'` `test2` is not a text file, rather a folder where I would like to put the filtered files into. – Fordo Mar 05 '19 at 20:10
  • 1
    No, my suggestion only applies to text files. – Dan Lewis Mar 05 '19 at 20:12
  • Unfortunately, that outputs the resulting file into the `test2` file, it just changes the format of the file itself. – Fordo Mar 05 '19 at 20:16
  • 1
    Why are you using `os.path.join()`? Couldn't you just concatenate the strings? Like this: `new_file_path = output_location + file_name + '_filtered.txt'` Keep in mind that if you use this, `file_name` will have to be without the extension. – Dan Lewis Mar 05 '19 at 20:19
  • Doing that gives me this error: `OSError: [Errno 22] Invalid argument: 'C:\\Users\\User\\Desktop\\test2C:/Users/User/Desktop/mini_mouse/1980_filtered.txt'` – Fordo Mar 05 '19 at 20:28
  • 1
    Ah I see the problem. The file isn't currently created, so you can't use `'a'` as your filemode. On the last `open()` function, change the mode to `'w'`, which will automatically create the file. – Dan Lewis Mar 05 '19 at 20:32
  • Fantastic! That did the trick! Thank you so much for your help and sticking with me! :) – Fordo Mar 05 '19 at 20:35
  • 1
    No problem! Would be happy to help you again! – Dan Lewis Mar 05 '19 at 20:42
  • 1
    Hi, I just noticed that you haven't marked my answer as correct. It'd really help me out if you could do that, as I did fix your problem. Thanks! – Dan Lewis Mar 06 '19 at 19:49
  • Oh sorry about that! Thanks again. – Fordo Mar 08 '19 at 18:10
1

Based your code it looks like an issue in the line:

new_file_path = os.path.join(output_location, file_name) + '_filtered'

In Python's os.path.join() any absolute path (or drive letter in Windows) in the inputs will discard everything before it and restart the join from the new absolute path (or drive letter). Since you're calling file_name directly from list_of_files.txt and you have each path formatted there relative to the C: drive, each call to os.path.join() is dropping output_location and being reset to the original file path.

See Why doesn't os.path.join() work in this case? for a better explanation of this behavior.

When building the output path you could strip the file name, "1980" for instance, from the path "C:/Users/User/Desktop/mini_mouse/1980" and join based on the output_location variable and the isolated file name.

dhw
  • 11
  • 1