8

I am trying to get a list of strings with the file path and the file name. At the moment I only get the file names into the list.

Code:

hamFileNames = os.listdir("train_data\ham")

Output:

['0002.1999-12-13.farmer.ham.txt', 
 '0003.1999-12-14.farmer.ham.txt', 
 '0005.1999-12-14.farmer.ham.txt']

I would want an output similar to this:

['train_data\ham\0002.1999-12-13.farmer.ham.txt',
 'train_data\ham\0003.1999-12-14.farmer.ham.txt',
 'train_data\ham\0005.1999-12-14.farmer.ham.txt']
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
user
  • 854
  • 2
  • 12
  • 28

2 Answers2

2

If you're on Python 3.5 or higher, skip os.listdir in favor of os.scandir, which is both more efficient and does the work for you (path is an attribute of the result objects):

hamFileNames = [entry.path for entry in os.scandir(r"train_data\ham")]

This also lets you cheaply filter (scandir includes some file info for free, without stat-ing the file), e.g. to keep only files (no directories or special file-system objects):

hamFileNames = [entry.path for entry in os.scandir(r"train_data\ham") if entry.is_file()]

If you're on 3.4 or below, you may want to look at the PyPI scandir module (which provides the same API on earlier Python).

Also note: I used a raw string for the path; while \h happens to work without it, you should always use raw strings for Windows path literals, or you'll get a nasty shock when you try to use "train_data\foo" (where \f is the ASCII form feed character), while r"train_data\foo" works just fine (because the r prefix prevents backslash interpolation of anything but the quote character).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
1

Since you have access to the directory path you could just do:

dir = "train_data\ham"
output = map(lambda p: os.path.join(dir, p), os.listdir(dir))

or simpler

output = [os.path.join(dir, p) for p in os.listdir(dir)]

Where os.path.join will join your directory path with the filenames inside it.

CakePlusPlus
  • 943
  • 7
  • 15