I use the following code:
from collections import defaultdict
import sys
import os
for doc in os.listdir('path1'):
doc1 = "path1" + doc
doc2 = "path2" + doc
doc3 = "path3" + doc
with open(doc1,"r") as words:
sent = words.read().split()
print sent
linenos = {}
with open(doc2, "r") as f1:
for i, line in enumerate(f1):
for word in sent:
if word in line:
if word in linenos:
linenos[word].append(i + 1)
else:
linenos[word] = [i + 1]
matched2 = []
for word in sent:
if word in linenos:
matched2.append('%s %r' % (word, linenos[word][0]))
else:
matched2.append('%s <does not exist>' % word)
with open(doc3,"w") as f1:
f1.write( ', '.join(matched2))
So, my path1 contains files like file1.title, file2.title and so on... till file240.title
Similarly, I have path2 which contains files like file1.txt, file2.txt and so on.. till tile240.txt
For example:
file1.title will have data like:
military troop deployment number need
file1.txt will have :
foreign 1242
military 23020
firing 03848
troop 2939
number 0032
dog 1234
cat 12030
need w1212
OUTPUT:
path3/file1.txt
military 2, troop 4, deployment <does not exist>, number 5, need 8
Basically, the code gets the line number of the words present in file1.txt and the words are inputted from file1.title. It works fine for individual files like inputting single file at a time. But I need this to be done for a folder full of documents.
That is, it should read words from file1.title and get the line numbers of the words from file1.txt and similarly, read words as string from file2.title and get the line numbers of those words from file2.txt and so on..
The problem is, I am unable to read the same files with different extensions with this code. How should I modify this to get the appropriate output?