I am trying to return a portion of the file name after using os.scandir to loop through all .txt files. I'm taking a directory, searching through each text file within the directory for certain words, pulling the section that those words are found in, and then printing. While that portion works, I need to add the file name that the portion of text was found in. Something like HD 354950 : supply chain issues were found with garden gnomes.
Below is the working code for just returning the information from within the texts -
dict = []
linenumber = 0
pattern = re.compile(r"\bsupply|finance\b", re.IGNORECASE)
for filename in os.scandir(directory):
if filename.path.endswith(".txt"):
f = open(filename, encoding = 'utf-8')
lines = f.readlines()
for line in lines:
linenumber += 1
if pattern.search(line) != None:
dict.append((linenumber, line.rstrip('\n')))
continue
else:
continue
when the text is returned I want to be able to pull the name of the file that the text was found alongside the text itself. The filename is typically - HD_0000354950_10Q_20200503_Item1A_excerpt.txt and I want to return HD 354950.
I would like to join this with the output of what is returned when
for d in dict:
print(filenamepieces, ":" + d[1])
where 'filenamepieces' is the file that the text tidbit is taken from