1

I use the following code:

 from collections import defaultdict
 import sys
 import os
 for doc in   os.listdir('path1'):
doc1 = "path1" + doc
doc2 = "path2" + doc

doc3 = "path3" + doc
with open(doc1,"r") as words:
    sent = words.read().split()
        print sent
    linenos = {}

    with open(doc2, "r") as f1:
            for i, line in enumerate(f1):
                for word in sent:
                        if word in line:
                            if word in linenos:
                                    linenos[word].append(i + 1)
                            else:
                                    linenos[word] = [i + 1]

    matched2 = []
    for word in sent:
            if word in linenos:
                matched2.append('%s %r' % (word, linenos[word][0]))
            else:
                matched2.append('%s <does not exist>' % word)
    with open(doc3,"w") as f1:
        f1.write( ', '.join(matched2))

So, my path1 contains files like file1.title, file2.title and so on... till file240.title

Similarly, I have path2 which contains files like file1.txt, file2.txt and so on.. till tile240.txt

For example:

file1.title will have data like:

military  troop deployment number need  

file1.txt will have :

foreign 1242
military 23020
firing  03848
troop 2939
number 0032
dog 1234
cat 12030
need w1212

OUTPUT:

path3/file1.txt

military 2, troop 4, deployment <does not exist>, number 5, need 8

Basically, the code gets the line number of the words present in file1.txt and the words are inputted from file1.title. It works fine for individual files like inputting single file at a time. But I need this to be done for a folder full of documents.

That is, it should read words from file1.title and get the line numbers of the words from file1.txt and similarly, read words as string from file2.title and get the line numbers of those words from file2.txt and so on..

The problem is, I am unable to read the same files with different extensions with this code. How should I modify this to get the appropriate output?

Ana_Sam
  • 469
  • 2
  • 4
  • 12
  • Possible duplicate of [changing file extension in python](http://stackoverflow.com/questions/2900035/changing-file-extension-in-python) – R Nar Nov 17 '15 at 17:18
  • No. I don't want t rename but use two files with different extensions to get the line number – Ana_Sam Nov 17 '15 at 17:19
  • When asking a question on SO, try boiling it down to a short, self-containing example. Most of the code and explanation has nothing to do with your actual problem. – Falko Nov 17 '15 at 17:25
  • Sorry.. I am still learning to get a hold of stackoverflow. I will change it here after. – Ana_Sam Nov 17 '15 at 17:28

3 Answers3

2

I guess you're asking for replacing the extension in a filename string, like as follows:

doc2 = "path2" + doc[:-6] + ".txt"

This strips the 6 characters ".title" from doc and adds the extension ".txt".

Falko
  • 17,076
  • 13
  • 60
  • 105
1

Are you looking to do something like this?

import os

for name in set([fname.split('.')[0] for fname in os.listdir('.') if fname.split('.')[1] in ['txt', 'title']]):
    f1 = open(''.join([name, '.txt'])).read()
    f2 = open(''.join([name, '.title'])).read()
    # Do whatever with the file contents
Adam Acosta
  • 603
  • 3
  • 6
  • I wanted to strip the extension and perform the necessary functions. The previous answer was what I wanted. Thanks for your time – Ana_Sam Nov 17 '15 at 17:42
0

I think you just need to write the full name of the file on open(docx, 'w'). For example replace doc1 to 'file1.title' and doc2 to 'file1.txt', I don't know if that's what you're doing but the extension is important when you call for a file.

Seraf
  • 850
  • 1
  • 17
  • 34
  • I want this process to be performed for a folder full of files and not on single file at a time. It works for single files – Ana_Sam Nov 17 '15 at 17:20