7
import os
Current_Directory = os.getcwd() # Should be ...\archive
CORPUS_PATHS = sorted([os.path.join("archive", directories) for directories in os.listdir(Current_Directory)])
filenames = []
for items in CORPUS_PATHS:
    filenames.append(sorted([os.path.join(CORPUS_PATHS, fn) for fn in os.listdir(items)]))

print filenames

I am running this code from a file called archive and in archive there are more folders and in each of these folders, there are one or more text files. I want to make a list that includes the path to each of these folders. However the following error appears.

[Error 3] The system cannot find the path specified:

I currently have the python script where I wrote this code in the same folder as archive and it will trigger this error. What should I do in order to stop this error and get all the file paths.

I am pretty bad at using os and I don't use it that often so I apologize if this is a trivial question.

Edit

import os
startpath = "archive"
corpus_path = sorted([os.path.join("archive/", directories) for directories in os.listdir(startpath)])

filenames = []
for items in corpus_path:
    print items
    path = [os.path.join(corpus_path, fn) for fn in os.listdir(items)]
    print path

So I have made some progress and now I corpus path is essentially a list with the path to all of the desired folders. Now all I am trying to do is get all of the paths to the text files inside these folders but I still run into issues and I don't know how but error such as

File "C:\Users\David\Anaconda\lib\ntpath.py", line 65, in join
result_drive, result_path = splitdrive(path)

File "C:\Users\David\Anaconda\lib\ntpath.py", line 116, in splitdrive
normp = p.replace(altsep, sep)

AttributeError: 'list' object has no attribute 'replace'
Community
  • 1
  • 1
David Yi
  • 401
  • 1
  • 6
  • 18
  • Check the answer [here](http://stackoverflow.com/questions/9220280/python-windowserror-error-3-the-system-cannot-find-the-file-specified-when-tr) – GAVD Sep 03 '15 at 03:24

1 Answers1

6

You must be on windows machine. Error is because of os.listdir(). os.listdir() is not getting correct path.

And in line number 3, you are doing os.path.join("archive", directories). You should join complete path including drive (C: or D:) like "c:/archive/foo: or on linux "home/root/archive/foo"

Read - Python os.path.join on Windows

os.path.join Usage -

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.

Edit:

You are passing list corpus_path to [os.path.join][2] in line 6. That causes error! Replace corpus_path with items.

I created archive folder in my 'D:' Drive. Under archive folder I created 3 folders foo1, foo2 and foo3. Each folder contains 1 or 2 text file. Then I tested your code after modification. Code work fine. Here is the code:

import os
startpath = "d:archive"
corpus_path = sorted([os.path.join("d:", "archive", directories) for directories in os.listdir(startpath)])

filenames = []
for items in corpus_path:
    print items
    path = [os.path.join(items, fn) for fn in os.listdir(items)]
    print path

output:

d:archive\foo1
['d:archive\\foo1\\foo1.txt.txt', 'd:archive\\foo1\\foo11.txt']
d:archive\foo2
['d:archive\\foo2\\foo2.txt.txt']
d:archive\foo3
['d:archive\\foo3\\foo3.txt.txt']
Community
  • 1
  • 1
Gaurav Vichare
  • 1,143
  • 2
  • 11
  • 26
  • The thing is that what I am aiming for is not to write a full path but only part of it that comes after the folder archive. My code will also trigger an error because the actual python script that I wrote the code on is in that directory. So I assume that I need to move it out but I don't know how to edit my code to fit that. – David Yi Sep 03 '15 at 17:35
  • @DavidYi If you are not passing full path to `os.listdir()` (Line 6), then how it will list directories under it? I am missing anything? Archive folder is insdie one of the drive (C: / D: / Home). Inside folder archive there are more folders eg foo1, foo2 and foo3. Archive folders also contains python script. Inside foo1, foo2 and foo3 there are text files. Right ? – Gaurav Vichare Sep 04 '15 at 04:42
  • Yes you are correct in what you are saying. But an issue will occur that /archive/(python script)/... will not be found as the code is assuming that the python script is another folder. It will run an error so I need to either find a way to get it to be ignored. Also even without the full path, I tried it before and the current directory is where the script is being run, the partially made path can access the folders I want. – David Yi Sep 04 '15 at 16:08
  • 1
    @DavidYi Remove '/' from line no 3 in your edited code and try. so line 3 will be `corpus_path = sorted([os.path.join("archive", directories) for directories in os.listdir(startpath)])` – Gaurav Vichare Sep 05 '15 at 10:14
  • It still doens't give me what I want. Errors still pop up – David Yi Sep 05 '15 at 19:00