0

Hoi!

I am indexing files on a file-server using os.walk() (check certain files for their content and then adding them to a list to be reviewed later). It works quite nicely for a smaller folders, but to do the whole thing it takes quite some time (>1h) and there is constantly data added to the server.

How does this impact my indexing? What files will be included? The ones that happen to be in a certain location, when it is scanned? I am mainly interested in a "snapshot" at a certain moment in time.

Thank you!

Kind regards,

Sebastian

This is the code I am using currently. is there maybe a better/faster way to do this?

inport os

file_list = []
for current_folder, sub_folders, file_names in os.walk(my_path):
    for file in file_names:
        if file == "title":
            with open(os.path.join(current_folder, file), "r") as f:
                title = f.read()
            file_list.append([file, current_folder, title])
nocab
  • 156
  • 1
  • 11
  • use file event instead of loop file system yourself. https://stackoverflow.com/questions/182197/how-do-i-watch-a-file-for-changes – 宏杰李 Apr 12 '18 at 12:32
  • Thank you for your answer! I do not really care that much about the new files. I want to check all existing ones and to understand what happens if the folders and contents change, while I am searching them. – nocab Apr 12 '18 at 12:40
  • Hey! From what I understand, you want to index on a snapshot of the file system. If any OS+file system supports this, of course it would be very slow. I think you can make a simple algorithm like walk that uses a hash function to check if it shall follow a specific route on the file system tree. Some software like git and robocopy use this kind of logic. But I'm pretty sure what you need is already implemented out there. – tur1ng Apr 12 '18 at 12:47
  • If you want to check if another application is manipulating a file while you are walking by it. I think you have to use some kind of IPC, you can also lock on files https://linux.die.net/man/2/flock – tur1ng Apr 12 '18 at 12:53
  • Thanks, I will check that out! – nocab Apr 17 '18 at 06:47

0 Answers0