1

So this is probably horribly inefficient but I am trying to find a way to build list of files in a directory (Their are 10's of thousands), I abstract information from that file, than I build a cache file so that I will only check NEW files for this information.

What I am doing right now is in the Properties.Settings.Default.FileCache I have a String Collection and I am running through my application like this

Parsing Process:

1- Iterate through all folders and folders to build file list

2- Reload cache file and compare (Explained later since it probably makes more sense to explain how I am building it in the first place before I explain how I am comparing)

3- Parse the information I want from new files

4- Properties.Settings.Default.Add (FileName + "|" Information1 + | Information2)

Reloading Cache and comparing:

1- Split three values into a List

2- If the File Exists on the Cache List I remove it from the New List

3- For any remaining files I go to STEP 3 above.

This seems horribly inefficient. But I am new to C# and it is the only method I have come up with so far.

Cade
  • 97
  • 2
  • 8

1 Answers1

1

Seems like you can save yourself a little trouble by loading the cache first and create a HashSet<string> containing all of the file names that already exist in the cache.

Then iterate through the the folders. For each file, first see if it's in the cache. If it's not in the cache, then parse the information you want and add that name to the cache.

That way the amount of information you're holding in memory is smaller (i.e. you don't have to keep all of the file names around), and you look at a file one time. If it's already in the cache, then ignore it. If it's not in the cache, extract the information you want and add to the cache. Then move on.

Unless you can be notified somehow of new files (for example, your program is always running and has a FileSystemWatcher monitoring the directory), that's the best you can do.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • I would recommend not to rely too much on the `FileSystemWatcher`. It sometimes kinda _looses track_ for at least removable drives like network shares - short peek on your network switch at the wrong moment and you get no notification for new files anymore. That's why we additionally poll and compare from time to time (about an hour for our use case). See http://stackoverflow.com/a/11307399/303290 for additional info. – mbx Dec 19 '14 at 17:50
  • I cannot have FileSystemWatcher monitor the drive. Application needs to run as a daily batch process. Hashset seems to be a very useful data type, I appreciate that. What do you think about how I am storing the information for the next execution? – Cade Dec 19 '14 at 18:15
  • @Cade: As I understand it, you're storing the data in the app.config file. That's okay for a small utility app, but I wouldn't suggest it for a production application. First, your program shouldn't in general be able to modify anything in the application directory. You should store the data in a different place, and it should be clearly identified as a data file. – Jim Mischel Dec 19 '14 at 19:00
  • I ended up doing something completely different. The Hashset compared the files to the list a lot faster than I thought it would for some reason (Several thousand in less than a second). I ended up doing something completely different. I created a delimited file with a list of the files and the data I needed from them, then I just loaded it into a hashset and compared the file names against the directory enumeration. Works like a charm! Thanks for your help! – Cade Dec 22 '14 at 20:05