-1

I am trying to return a portion of the file name after using os.scandir to loop through all .txt files. I'm taking a directory, searching through each text file within the directory for certain words, pulling the section that those words are found in, and then printing. While that portion works, I need to add the file name that the portion of text was found in. Something like HD 354950 : supply chain issues were found with garden gnomes.

Below is the working code for just returning the information from within the texts -

dict = []
linenumber = 0
pattern = re.compile(r"\bsupply|finance\b", re.IGNORECASE)

for filename in os.scandir(directory):
    if filename.path.endswith(".txt"):
        f = open(filename, encoding = 'utf-8')
            lines = f.readlines()
            for line in lines:
                linenumber += 1
                if pattern.search(line) != None:
                    dict.append((linenumber, line.rstrip('\n')))
        continue
    else:
        continue

when the text is returned I want to be able to pull the name of the file that the text was found alongside the text itself. The filename is typically - HD_0000354950_10Q_20200503_Item1A_excerpt.txt and I want to return HD 354950.

I would like to join this with the output of what is returned when

for d in dict:
   print(filenamepieces, ":" + d[1])

where 'filenamepieces' is the file that the text tidbit is taken from

1 Answers1

1

Here's an example using split() and converting the string to int:

fileName = "HD_0000354950_10Q_20200503_Item1A_excerpt.txt" # The name of the file

splitFile = fileName.split("_") # Splits the file name with underscores (_) into sections 
index1 = splitFile[0] # Gets the name at the first index
index2 = splitFile[1] # Gets the name at the second index
index2 = int(index2) # Converts the second name into an int to remove the unnecessary zeros

finale = f"{index1} {index2}" # Final string
print(finale) # Prints the final string

# Program outputs : HD 354950
creed
  • 1,409
  • 1
  • 9
  • 18