-1

I'm new to coding. I have about 50 notepad files which are transcripts with timestamps that I want to search through for certain words or phrases. Ideally I would be able to input a word or phrase and the program would find every instance of its occurrence and then print it along with its timestamp. The lines in my notepad files looks like this:

"then he went to the office

00:03

but where did he go afterwards?

00:06

to the mall

00:08"

I just want a starting point. I have fun in figuring things out on my own, but I need somewhere to start.

A few questions I have:

How can I ensure that the user input will not be case sensitive? How can I open many text files systematically with a loop? I have titled the text files {1,2,3,4...} to try to make this easy to implement but I still don't know how to do it.

  • These are good questions, but you really should ask only one question per question. I'd imagine it should not be too hard to find an existing question at least about the first one. See also [How to ask.](/help/how-to-ask) – tripleee Jan 02 '21 at 09:52

4 Answers4

3
from os import listdir
import re

def find_word(text, search):
    result = re.findall('\\b'+search+'\\b', text, flags=re.IGNORECASE)
    if len(result)>0:
        return True
    else:
        return False


#please put all the txt file in a folder (for my case it was: D:/onr/1738349/txt/')
search = 'word' #or phrase you want to search
search = search.lower() #converted to lower case

result_folder= "D:/onr/1738349/result.txt" #make sure you change them accordingly, I put it this why so you can understand 
transcript_folder = "D:/onr/1738349/txt" #all the transcript files would be here


with open(result_folder, "w") as f: #please change in to where you want to output your result, but not in the txt folder where you kept all other files
    for filename in listdir(transcript_folder): #the folder that contains all the txt file(50 files as you said)
        with open(transcript_folder+'/' + filename) as currentFile:
            # Strips the newline character
            i = 0;
            for line in currentFile:
                line = line.lower() #since you want to omit case-sensitivity
                i=i+1
                if find_word(line, search): #for exact match, for example 'word' and 'testword' would be different  
                    f.write('Found in (' + filename[:-4] + '.txt) at line number: ('+str(i) +') ')
                    if(next(currentFile)!='\0'):
                        f.write('time: ('+next(currentFile).rstrip()+') \n')
                else:
                    continue

Make sure you follow the comments.

(1) create result.txt

(2) keep all the transcript files in a folder

(3) make sure (1) and (2) are not in the same folder

(4) Directory would be different if you are using Unix based system(mine is windows)

(5) Just run this script after making suitable changes, the script will take care all of it(it will find all the transcript files and show the result to a single file for your convenience)

The Output would be(in the result.txt):

Found in (fileName.txt) at line number: (#lineNumber) time: (Time) .......

Md Sajid
  • 131
  • 1
  • 1
  • 13
2

How can I ensure that the user input will not be case sensitive?

As soon as you open a file, covert the whole thing to capital letters. Similarly, when you accept user input, immediately convert it to capital letters.

How can I open many text files systematically with a loop?

import os

Files = os.listdir(FolderFilepath)

for file in Files:

I have titled the text files {1,2,3,4...} to try to make this easy to implement but I still don't know how to do it.

Have fun.

AndrewGraham
  • 310
  • 1
  • 8
0

Maybe these thoughts are helpful for you: First I would try to find I way to get hold of all files I need (even if its only their paths) And filtering can be done with regex (there are online websites where you can test your regex constructs like regex101 which I found super helpful when I first looked at regex)

Then I would try to mess around getting all timestemps/places etc. to see that you filter correctly

Have fun!

Tabea
  • 39
  • 1
  • 7
0
searched_text = input("Type searched text: ")

with open("file.txt", "r") as file:
lines = file.readlines()
for i in range(len(lines)):
    lines[i] = lines[i].replace("\n","")
for i in range(len(lines)-2):
    if searched_text in lines[i]:
        print(searched_text + " appeared in " + str(lines[i]) + " at datetime " + 
str(lines[i+2]))

Here is code which works exactly like you expect

Franciszek Job
  • 388
  • 2
  • 9