Hello stackoverflow community! I've used this community for years to accomplish small one off projects for work, school, and personal exploration; however, this is the first question i've posted ... so be delicate ;)
I'm trying to read every file from a directory and all subdirectories, then accumulate the results to one dictionary with Python. Right now the script (see below) is reading all files as required but the results are individually for each file. I'm looking for help to accumulate into one.
Code
import re
import os
import sys
import os.path
import fnmatch
import collections
def search( file ):
if os.path.isdir(path) == True:
for root, dirs, files in os.walk(path):
for file in files:
# words = re.findall('\w+', open(file).read().lower())
words = re.findall('\w+', open(os.path.join(root, file)).read().lower())
ignore = ['the','a','if','in','it','of','or','on','and','to']
counter=collections.Counter(x for x in words if x not in ignore)
print(counter.most_common(10))
else:
words = re.findall('\w+', open(path).read().lower())
ignore = ['the','a','if','in','it','of','or','on','and','to']
counter=collections.Counter(x for x in words if x not in ignore)
print(counter.most_common(10))
path = raw_input("Enter file and path")
Results
Enter file and path./dirTest
[('this', 1), ('test', 1), ('is', 1), ('just', 1)]
[('this', 1), ('test', 1), ('is', 1), ('just', 1)]
[('test', 2), ('is', 2), ('just', 2), ('this', 1), ('really', 1)]
[('test', 3), ('just', 2), ('this', 2), ('is', 2), ('power', 1),
('through', 1), ('really', 1)]
[('this', 2), ('another', 1), ('is', 1), ('read', 1), ('can', 1),
('file', 1), ('test', 1), ('you', 1)]
Desired Results - Example
[('this', 5), ('another', 1), ('is', 5), ('read', 1), ('can', 1),
('file', 1), ('test', 5), ('you', 1), ('power', 1), ('through', 1),
('really', 2)]
Any guidance would be greatly appreciated!