The mode w+
for open
causes to truncate the file, this is the reason for losing the lines, and only the last one will stay there.
An other little problem can be that this method of joining the path and the file name is not portable. You should user os.path.join
for that purpose.
with open(os.path.join(os.getcwd(),"ally_"+i+".txt"), 'a') as f:
f.write("%s\n" % file1)
An other issue can be the week performance which you can have in case of many directories and files.
In your code you run through the filenames in the directory for each extension and open the output file again and again.
One more issue can be the checking of the extension. In most cases the extension can be determined by checking the ending of the file name, but sometimes it can be misleading. E.g. '.doc'
is an extension however in a filename 'Medoc'
the ending 'doc'
is just 3 letters in a name.
So I give an example solution for these problems:
import os
exts = ['ppt', 'pptx', 'doc', 'docx', 'txt', 'pdf', 'epub']
files = []
outfiles = {}
for root, dirnames, filenames in os.walk('.'):
for filename in filenames:
_, ext = os.path.splitext(filename)
ext = ext[1:] # we do not need "."
if ext in exts:
file1 = os.path.join(root, filename)
#print(i,file1)
if ext not in outfiles:
outfiles[ext] = open(os.path.join(os.getcwd(),"ally_"+ext+".txt"), 'a')
outfiles[ext].write("%s\n" % file1)
for ext,file in outfiles.iteritems():
file.close()