2

I have a directory with ~2200 text files in them. I need to delete any text file that does not contain the specific words i've defined. Can someone please look at this code and make a suggestion on how to get it working? Right now, when i run this it says it can't find the directory "C".

Also, i want to make sure that this runs for every file within that directory. Do i need to include a next function?

import os

path = r'C:\Users\user\Desktop\AFL codes to test'
words = ['buy', 'sell']

for root, dirs, files in os.walk(path):
    for file in path:
        if not any(words in file for words in words):
            os.remove(file)

Also, here is the full traceback:

runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-23-dbc80e182b2b>", line 1, in <module>
    runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py", line 9, in <module>
    os.remove(file)

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C'

This is the error after trying shutil.rmtree

runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-16-dbc80e182b2b>", line 1, in <module>
    runfile('C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py', wdir='C:/Users/user/.spyder-py3')

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\user\Anaconda31\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/user/.spyder-py3/DELETE FILES THAT DONT CONTAIN CERTAIN WORDS.py", line 12, in <module>
    shutil.rmtree(full_path)

  File "C:\Users\user\Anaconda31\lib\shutil.py", line 494, in rmtree
    return _rmtree_unsafe(path, onerror)

  File "C:\Users\user\Anaconda31\lib\shutil.py", line 376, in _rmtree_unsafe
    onerror(os.listdir, path, sys.exc_info())

  File "C:\Users\user\Anaconda31\lib\shutil.py", line 374, in _rmtree_unsafe
    names = os.listdir(path)

NotADirectoryError: [WinError 267] The directory name is invalid: 'C:/Users/user/Desktop/AFL codes to test/newfile1.txt'
Yousuf
  • 89
  • 2
  • 11

1 Answers1

3

you should replace backslashes with regular slashes.

path = r'C:\Users\user\Desktop\AFL codes to test'

should be

path = 'C:/Users/user/Desktop/AFL codes to test'

EDIT: here is the full code that should get you going:

import os

path = 'C:/Users/user/Desktop/AFL codes to test'
words = ['buy', 'sell']

files = os.listdir(path)
for each_file in files:
    full_path = "%s/%s" % (path, each_file)
    each_file_content = open(full_path, 'r', encoding="utf-8").read()
    if not any(word in each_file_content for word in words):
       os.unlink(full_path)
Ali Yılmaz
  • 1,657
  • 1
  • 11
  • 28
  • @Netwave he gets error "Right now, when i run this it says it can't find the directory 'C'". I believe backslashes in his string causes this. Whats your opinion? – Ali Yılmaz May 12 '18 at 16:39
  • 1
    he is iterating over the the `path` string, so the first elemt in it is `'C'`, so of course unless there is a directory or file in his cwd named like that it cannot be found – Netwave May 12 '18 at 16:41
  • Thats right. I edited my code. can you try and report if that works? – Ali Yılmaz May 12 '18 at 16:44
  • I do not know how `os.walk(path)` works, so I used `os.listdir`. I hope that figures it out for you. – Ali Yılmaz May 12 '18 at 16:48
  • It looks like its causing a "Permission error [winError32]; when it reaches the os.remove. Probably due to listdir still "using" the file. How would i solve this? https://stackoverflow.com/questions/27215462/permissionerror-winerror-32-the-process-cannot-access-the-file-because-it-is?rq=1 – Yousuf May 12 '18 at 18:20
  • can you share the full error message? if it's os.listdir, you can assign os.listdir to a list and iterate it. I updated my answer so you can check it out. – Ali Yılmaz May 12 '18 at 19:23
  • check this out: https://stackoverflow.com/questions/1213706/what-user-do-python-scripts-run-as-in-windows – Ali Yılmaz May 12 '18 at 19:28
  • can you try with `shutil.rmtree` and let me know about the results? I don't use windows so I cant test it out. updated the code in my answer. – Ali Yılmaz May 12 '18 at 19:29
  • it turns out that `shutil.rmtree` is used for directories only. can you try with `os.unlink(full_path)` ? – Ali Yılmaz May 13 '18 at 07:07
  • Still doesn't work. Can we add a read to it instead? So then we can just do each_file_content.close(); that way, there is no permission error since technically when use "open" you are using the file. – Yousuf May 13 '18 at 18:52
  • 1
    WORKED! I just had to add encoding as utf-8. Thanks for not giving up :) – Yousuf May 13 '18 at 21:29
  • @Yousuf as far as i can recall, python uses utf-8 by default. you must have changed it somewhere else. anyway, glad u made it work. cheers :) – Ali Yılmaz May 13 '18 at 21:49