0

I try to search a string in multiple files, my code works fine but for big text files it takes a few minutes.

wrd = b'my_word'
path = 'C:\path\to\files'
    #### opens the path where all of .txt files are ####
for f in os.listdir(path):
    if f.strip().endswith('.txt'):
        with open(os.path.join(path, f), 'rb') as ofile:
        #### loops through every line in the file comparing the strings ####
            for line in ofile:                
               if wrd in line:
                try:
                    sendMail(...)
                    logging.warning('There is an error {} in this file : {}'.format(line, f))
                    sys.exit(0)
                except IOError as e:
                    logging.error('Operation failed: {}' .format(e.strerror))
                    sys.exit(0)

I found this topic : Python finds a string in multiple files recursively and returns the file path but it does not answer my question..

Do you have an idea how to make it faster ?

Am using python3.4 on windows server 2003.

Thx ;)

BaFouad
  • 87
  • 1
  • 12

1 Answers1

1

My files are generated from an oracle application and if there is an error, i log it and stop generation my files.

So i search my string by reading the files from the end, because the string am looking for is an Oracle error and is at the end of the files.

wrd = b'ORA-'
path = 'C:\path\to\files'    
     #### opens the path where all of .txt files are ####
    for f in os.listdir(path):
        if f.strip().endswith('.txt'):
            with open(os.path.join(path, f), 'r') as ofile:
                        try:
                            ofile.seek (0, 2)           # Seek a end of file
                            fsize = ofile.tell()        # Get Size
                            ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
                            lines = ofile.readlines()       # Read to end

                            lines = lines[-10:]    # Get last 10 lines
                            for line in lines:
                                if string in line:
                                    sendMail(.....)
                                    logging.error('There is an error {} in this file : {}'.format(line, f))
                                    sys.exit(0)
                        except IOError as e:
                            logging.error('Operation failed: {}'.format(e.strerror))
                            sys.exit(0)
BaFouad
  • 87
  • 1
  • 12
  • Good approach but still.... if its multiple files and you don't know which has the error you could use multiprocessing (mp) module to run your code... twice or more.. depending on CPU power and amount of cores. Python has GIL and therefore runs code in serial and not parallel. Implementing mp enables you to run multiple instances in parallel. – ZF007 Feb 21 '18 at 11:42
  • Thx ZF007,I will see how multiprocessing works, but my code quickly finds the file that has the error from the first temptation, so i dont think that i have to use mp in this case – BaFouad Feb 22 '18 at 15:59