1

I need to create an application that browse directories (files) on a PC and make a list of them (path of each files).

Since later I need to randomly re-order this list, I want to store it in a data structure fast and easy to recall and manage (i.e. I don't want to browse the PC file list every time I open the application).

So, once scanned, storing the list and use it until I don't scan PC again (now, and in the future when I'll need to use this application).

What's the best data storage for this kind of application? Since I don't have any database, data must be stored in the application I guess (or at least on some data-file inside the directory of the application). I believe .txt file is slow and terrible :)

What can you suggest? I think I'll use Windows Form. SQLite?

markzzz
  • 47,390
  • 120
  • 299
  • 507
  • What will you be doing with that list? – Yuval Itzchakov Jul 16 '15 at 08:58
  • As I wrote: random the pathfile every time I want; once randomize, using Drag & Drop I want to move files into other application, after a selection. – markzzz Jul 16 '15 at 09:00
  • How big is the list - have you tried using a .txt file to see whether what you believe is true. – PaulF Jul 16 '15 at 09:02
  • 1
    Quite huge. Sort of 1.000.000 files (they are musical samples). – markzzz Jul 16 '15 at 09:03
  • If all you want to store is the list of files line by line, then a simple .txt file is probably the fastest. If you are prepared to structure the data, then using XML to copy the folder structure could result in a much smaller file. If it is a once only operation on starting the application (& once when saving it) - then try the .txt file to see if there is an acceptable delay. – PaulF Jul 16 '15 at 09:20
  • 2
    I would recommend some form of **binary** structured tree method. Its fast; easily navigat-able; doesn't require you to load all into memory. XML is not a wise choice. http://stackoverflow.com/questions/132330/maximum-size-for-xml-files. I think your SQLite idea has great merrit –  Jul 16 '15 at 09:41
  • @MickyDuncan: but where I store that binary structure? I know IList or HashTable is fast, but I need to store it once the file system is scanned. That's the problem... – markzzz Jul 16 '15 at 09:59
  • @markzzz Perhaps `Environment.CommonApplicationData` - _"[The directory that serves as a common repository for application-specific data that is used by all users](https://msdn.microsoft.com/en-us/library/system.environment.specialfolder(v=vs.110).aspx)"_ –  Jul 16 '15 at 10:40

4 Answers4

1

I can say that the file type or extension will not make that difference, the key here is how you will structure your data inside this file, for fast write/read.
In your case I would suggest using composite pattern and .xml file to store file paths and structure for later use.

Amr Elgarhy
  • 66,568
  • 69
  • 184
  • 301
1

store the file as JSON. Since you need to make a dictionary which is just a name/Path pair list then this is pretty much what json was designed for. Serialized dictionary to Json and store locally. later just deSerialized and pass name as key you will get file path. There a quite a few decent, free .NET json libraries

Kashif
  • 2,926
  • 4
  • 20
  • 20
  • 1
    Yes. BUT: how do I store it? That's the problem. I don't want to take 10 minuts when I reopen the program to re-load in memory the data... – markzzz Jul 16 '15 at 10:12
  • Who told you it will take 10 min? to read 10,000 record it suppose to take hardly 2 second. – Kashif Jul 16 '15 at 10:16
  • @Kashif OP wants 1,000,000 records so `_"2 seconds"_ * 1000000/10000 = 200 seconds` or **3.333 minutes**. –  Jul 16 '15 at 10:44
  • you should check https://msdn.microsoft.com/en-us/library/dd460693(v=vs.110).aspx – Kashif Jul 16 '15 at 10:46
  • Parallel programming can increase performance. – Kashif Jul 16 '15 at 10:47
  • OR I would recommend you to select the Records chunk by chunk. For Example (top 200 then second 200 and etc.). Just find a number where you are comfortable. – Kashif Jul 16 '15 at 10:51
  • @Kashif TPL is not suitable for I/O because it does not use I/O Completion Ports. TPL is for compute. http://stackoverflow.com/questions/8505815/how-to-properly-parallelise-job-heavily-relying-on-i-o. `async` is better suited though I question the benefit of concurrent I/O on the same file –  Jul 16 '15 at 13:56
0

I would recommand you to use .xml file, even if it's not really advise for data strorage, example :

<file>
    <path>
    <name>
    <size>
    ...
<file>
Antt
  • 159
  • 2
  • 13
0

Your requirement is for Random Access, therefore you could use a keyed dictionary data source (assuming your key has been randomly selected by some other process or by the user). If you are storing File Names then these have a fixed maximum size, which means you are in the happy place of being able to use some very old, but very fast techniques.

Since you want the highest possibly performance you will want an in-process memory store, not a windows service based store.

If you have a low volume of data writes and a very high volume of data reads, then I'd recommend sorting your data, writing to a fixed-sized-row data file, and using Binary Chunk mechanism to find the key you want. Open the data file using a reasonably high buffer size and keep it open, using Seek to move around the fixed size records. When data is written, add it to the end of the file in an unsorted block until that unsorted block goes above a certain limit, then re-sort you whole file and re-write. When searching via binary-chunk, if you don't find the key you are after, then search through the unsorted additions.

Binary chunking is super-fast and prevents you needing to maintain an index. This works no matter how unbalanced your data set is in terms of selectiveness and spread.

PhillipH
  • 6,182
  • 1
  • 15
  • 25