0

I am working on an application that keeps track and uses files stored on the file system. Users are allowed to open, create, delete and move files in the file system. Meanwhile my application is not constantly running so I can't keep track of all changes real-time. Afterwards my application has to find out what file is whom (i.c. as identified in my application).

The most preferred solution for the users is that the application solves every change itself. Each user-interaction is less preferred.

One of my ideas was to use an attribute of a file and assign a key-value to it so when it has been identified once, it can always been recognized afterwards. But I don't know if there is such an attribute. This article didn't give much hope: There is in Windows file systems a pre computed hash for each file?.

Does somebody know if there is such an attribute I can use? And how can I used it in C#?

Is there anyone who is running up against this problem? And how did you solve it?

I'd like to hear good suggestions.

regards, Jaap

Community
  • 1
  • 1
Dev.Jaap
  • 152
  • 6
  • Under linux/ext you could probably achieve this with hard links. I'm not sure whether anything similar exists in NTFS? – JohnB Aug 31 '12 at 14:07
  • So you have files in x amount of directories and you need to identify each file by name and version/date etc. - once file y is read in by your application you need a key of some sort to remember that next time y comes in you acknowledge you have processed it? – LukeHennerley Aug 31 '12 at 14:12
  • Some applications like Office have attributes for files but that is not at the NTFS level. Even if a hash was stored in NTFS the hash would change if the file was changed so would not do you much good. A application to keeps track of files is called a document management. – paparazzo Aug 31 '12 at 14:12
  • It would help greatly if you tell what you want to do exactly. Keeping track of files when your application isn't running is hard, since files can be altered/moved/deleted without you knowing it. Why would you want to do that? If it's for example for some kind of media library, then you could perhaps just re-scan the directory(ies) the files are in. – CodeCaster Aug 31 '12 at 14:19
  • 1
    The shell link object can track files. – Raymond Chen Aug 31 '12 at 14:19
  • @CodeCaster we have a digital file (like a customer file) containing documents (i.c. file of of an arbitrary format) that can be access through our application or through the windows file system / browser. Our application has to keep the internal records in sync with the actual situation on the file system. – Dev.Jaap Aug 31 '12 at 14:46
  • @JohnB I had to google for hard links, I know about symbolic links, but aren't hard links actually the absolute file path? That is only solid for as long as the user does not move the file to a new path. But maybe I do not fully understand the concept of hard links? – Dev.Jaap Aug 31 '12 at 14:50
  • @RaymondChen The "Shell link" look rather interesting, but I 'll have to investigate it whether it can solve my problem or not. – Dev.Jaap Aug 31 '12 at 14:58
  • @Blam The hash code is going to be changed if the document content is changed. I had discovered the Office attributes but these are also easily changed. And a text document does not have the office document attributes. – Dev.Jaap Aug 31 '12 at 15:00
  • 1
    @Dev: A hard link is a second entry to a file in the file system. Suppose you have two dirs, /A and /B. And a file /A/file. Then if you do "ln /A/file /B/file", both /A/file and /B/file point to the same file. Now removing /A/file does not remove /B/file, and moving /A/file to /C/file does not change /B/file, nor does removing /A/file. Altering the contents of /A/file also affects /B/file, since both files are the same. So by comparing /A and /B, you could in principle see what the user has done since the point in time when you created the hard links. – JohnB Aug 31 '12 at 15:16
  • Yes that is what I said. "the hash would change if the file was changed so would not do you much good" And a text document is not an Office document. – paparazzo Aug 31 '12 at 16:05

2 Answers2

2

If your files don't leave NTFS, this is easily achievable by alternative data streams, where you can store your data along with files. This is more-or-less good article about ADS: http://www.flexhex.com/docs/articles/alternate-streams.phtml

There is another suitable method - it's very efficient, but also very complicated to use, it requires quite good knowledge about NTFS internals - USN Change Journal; see http://msdn.microsoft.com/en-us/library/windows/desktop/aa363798.aspx. With USN Change Journal, you can "get" very efficiently all files that were changed (even all change events) within specified time period.

Nevertheless, if your files leave NTFS realm, e.g. if it's copied to FAT32, contents of ADS is lost.

Robert Goldwein
  • 5,805
  • 6
  • 33
  • 38
  • this appears a rather interesting option. Although NTFS is not the only file system we support, I think it is worth investigating it's applicability. Thanks. – Dev.Jaap Sep 03 '12 at 06:07
1

Relying on a File attribute is "dangerous" in that some user could alter the attribute while your program isn't running. This could lead you to believe that a certain file is (or isn't) tracked by the program while it really isn't.

I would suggest to keep track of the files in a database, XML, or some other file. When your application starts you read the file/db and check for new/deleted/editted files.

You could store a Hash of the files to find out if a file has been moved/editted. Keeping track of files that are moved AND editted is going to be pretty difficult. (I have no clue how you could achieve it)

PS: Have you considered making your application a Windows service? Having the file-management running in the background no matter if the GUI part of your application is running or not?

Laoujin
  • 9,962
  • 7
  • 42
  • 69
  • Adding a Windows service as part of my application (suite) is indeed an interesting idea, because it can use the FileWatcher class to observe changes in the file system permanently. Although I do not know whether it can used in combination with Samba and Unix/Linux file system (that we are also appears to support). – Dev.Jaap Aug 31 '12 at 14:39
  • We already store file names and paths in a database but the great challenge is to match changes from the out-site world i.e. e.g. the windows file browser – Dev.Jaap Aug 31 '12 at 14:41
  • Storing a hash (and perhaps creation/modified date) along with the file name would help you track changes in files (same location, different hash and modified date) and in file movements (old file is gone, and a new file with the same hash appeared). The problem is files that are both moved and have their content changed. I don't see how you could keep track of that (Unless those files have some sort of 'header' content that would remain unchanged). – Laoujin Aug 31 '12 at 14:50
  • I agree the great challenge are the file moved and changed – Dev.Jaap Aug 31 '12 at 15:06