10

I have multiple Windows programs (running on Windows 2000, XP and 7), which handle text files of different formats (csv, tsv, ini and xml). It is very important not to corrupt the content of these files during file IO. Every file should be safely accessible by multiple programs concurrently, and should be resistant to system crashes. This SO answer suggests using an in-process database, so I'm considering to use the Microsoft Jet Database Engine, which is able to handle delimited text files (csv, tsv), and supports transactions. I used Jet before, but I don't know whether Jet transactions really tolerate unexpected crashes or shutdowns in the commit phase, and I don't know what to do with non-delimited text files (ini, xml). I don't think it's a good idea to try to implement fully ACIDic file IO by hand.

What is the best way to implement transactional handling of text files on Windows? I have to be able to do this in both Delphi and C#.

Thank you for your help in advance.

EDIT

Let's see an example based on @SirRufo's idea. Forget about concurrency for a second, and let's concentrate on crash tolerance.

  1. I read the contents of a file into a data structure in order to modify some fields. When I'm in the process of writing the modified data back into the file, the system can crash.

  2. File corruption can be avoided if I never write the data back into the original file. This can be easily achieved by creating a new file, with a timestamp in the filename every time a modification is saved. But this is not enough: the original file will stay intact, but the newly written one may be corrupt.

  3. I can solve this by putting a "0" character after the timestamp, which would mean that the file hasn't been validated. I would end the writing process by a validation step: I would read the new file, compare its contents to the in-memory structure I'm trying to save, and if they are the same, then change the flag to "1". Each time the program has to read the file, it chooses the newest version by comparing the timestamps in the filename. Only the latest version must be kept, older versions can be deleted.

  4. Concurrency could be handled by waiting on a named mutex before reading or writing the file. When a program gains access to the file, it must start with checking the list of filenames. If it wants to read the file, it will read the newest version. On the other hand, writing can be started only if there is no version newer than the one read last time.

This is a rough, oversimplified, and inefficient approach, but it shows what I'm thinking about. Writing files is unsafe, but maybe there are simple tricks like the one above which can help to avoid file corruption.

UPDATE

Open-source solutions, written in Java:

Community
  • 1
  • 1
kol
  • 27,881
  • 12
  • 83
  • 120
  • 2
    Text files are a very poor idea if you need to access data in this way, and "transactional support" typically means opening the file in some exclusive mode. Trying to synchronize all that mess is going to give you vast headaches - if you really try to implement this, let me know in advance so I can buy stock in the company that makes your choice of painkillers. :-) You really should look for a better solution than trying to support transactional text files. – Ken White Dec 06 '12 at 00:07
  • Please comment on "safely accessible by multiple programs concurrently" - read only or you want to somehow allow read-write access? – Alexei Levenkov Dec 06 '12 at 00:11
  • @KenWhite OK :) The problem is, I must use text files, I cannot switch to, for example, MS SQL Compact databases. I already had file corruptions due to crashes, and when the OS ran out of resources due to some reason. – kol Dec 06 '12 at 00:12
  • 2
    :-) You've proven my point. You're never going to avoid the file corruptions due to things like crashes, unhandled exceptions, power outages, or any of the million other things that can go wrong via "transactional" text files (because there is no such thing). `Jet` supports "transactions", but not real ones, and I'm not sure that support includes the text driver (and MSAccess transactions aren't really solid for multi-users). @Alexei: Obviously it's not read-only, because if it were there would be no chance of corruption (and there would be no "transactions" to protect). – Ken White Dec 06 '12 at 00:17
  • @AlexeiLevenkov I must allow concurrent read-write access. But since I have relatively small files (<1 MB), the file operations are fast, so even locking wouldn't cause problems - my programs could wait for each other's operations to finish. – kol Dec 06 '12 at 00:19
  • Then you can try Mutexes - but you have to use in every program, which work on those files – Sir Rufo Dec 06 '12 at 00:20
  • @SirRufo Yes, but my main problem is not concurrent access, but the file corruptions due to system failures. I would like file operations to be atomic. For example, if I'm in the middle of writing a file, and the system crashes, then when the system restarts itself, I would like to find the original file, without any remnants of the failed writing attempt. – kol Dec 06 '12 at 00:24
  • 1
    Write the file with e.g. extension *.$$$ if io is successful rename the original file to *.old and *.$$$ to the original filename. In case of a system crash you will have to look for 2 States. 1) Original file exists, everything ok 2) Original file not exists, rename the *.old to original name – Sir Rufo Dec 06 '12 at 00:32
  • 1
    Answer: use any source control system - someone else already solved all your issues. Side note: "concurrent ... could wait for each other" - that seem to be non-traditional definition of concurrent. – Alexei Levenkov Dec 06 '12 at 00:38
  • 2
    See [TxF](http://msdn.microsoft.com/en-us/library/windows/desktop/aa363764(v=vs.85).aspx) – Ondrej Kelle Dec 06 '12 at 00:39
  • But if only your programs would access these files why not use a transaction safe approach (database) - and there is a lot more than sql express/ms sql compact – Sir Rufo Dec 06 '12 at 00:39
  • @TOndrej I use Win2k and WinXP, so unfortunately TxF is not an option. – kol Dec 06 '12 at 00:41
  • @SirRufo I incorporated your file-renaming idea into the question. If you have a more-or-less full solution based on this idea, then please consider writing an answer. – kol Dec 06 '12 at 00:53
  • The ideas in your itemised list don't work. You should use a transactional database. That will work. – David Heffernan Dec 06 '12 at 07:30
  • @DavidHeffernan Thank you, David, but do you think Jet Engine is trustworthy? Unfortunately I must use text files because my programs are parts of a large system, which cannot be modified. – kol Dec 06 '12 at 08:21
  • @AlexeiLevenkov Source version control... interesting idea, thank you! – kol Dec 06 '12 at 08:24
  • Jet is fine. I'd use something lighter and open source. Your approach won't work at all. – David Heffernan Dec 06 '12 at 08:57
  • @DavidHeffernan If you know "something lighter and open source", please consider writing an answer. I would also like to know why my suggested approach won't work exactly. I don't want to defend it, just want to learn :) Thank you in advance! – kol Dec 06 '12 at 09:24
  • 2
    Proper transaction support is way more complex than that. I don't know how to do it, but I know it's complex. That's why we use tools made by people that know how to do it. There are oodles of good databases out there. Not sure why you would choose Jet! I'd give up using text files though. Use a real database. – David Heffernan Dec 06 '12 at 09:28
  • 3
    Firebird Embedded, SQLite, NexusDB - a few embedded SQL databses with transactions support, yet i dunno if they support concurrent access to the same file, that is against of idea of embedded, exclusively-owned databases. Text files... write number "1234" instead of "123" in the 1st row - and you'd HAVE to re-write the rest of the file. And you would not be able to do it, for other readers would block their regions of fiel and deny your attempts. Even binary BDE Paradox had frequent db corruptions. – Arioch 'The Dec 06 '12 at 12:05
  • Are you secretly working on you own Kickstarter home-made RDBMS? – Adrian Salazar Dec 13 '12 at 10:43
  • 1
    To the people saying "use a RDBMS": Everything comes at a price, not only license cost, but footprint of solution, burden of version dependency, skills you require initially and in maintenance, ... And even DBMSs don't have magic built-in and have to do a tradeoff between performance and durability – Markus May 22 '15 at 07:25

6 Answers6

6

How about using NTFS file streams? Write multiple named(numbered/timestamped) streams to the same Filename. Every version could be stored in a different stream but is actually stored in the same "file" or bunch of files, preserving the data and providing a roll-back mechanism... when you reach a point of certainty delete some of the previous streams.

Introduced in NT 4? It covers all versions. Should be crash proof you will always have the previous version/stream plus the original to recover / roll-back to.

Just a late night thought.

http://msdn.microsoft.com/en-gb/library/windows/desktop/aa364404%28v=vs.85%29.aspx

Despatcher
  • 1,745
  • 12
  • 18
  • 2
    I gave you the 50 points, but I have to emphasize that this is not an answer to my question. I got no real answer, so I would prefer not giving the bounty to anyone, but the system would have automatically given you half of it, so I decided to give you all the points. – kol Dec 15 '12 at 18:04
  • Well, thank you! But I did realise that a doing it with streams would not be easy - just a possibly safer method, as they have some rather nice properties, than manipulating files with different names which could achieve the same thing. Cheers, good luck with it! – Despatcher Dec 15 '12 at 20:40
4

What you are asking for is transactionality, which is not possible without developing yourself the mechanism of a RDBMS database according to your requirements:

"It is very important not to corrupt the content of these files during file IO"

Pickup a DBMS.

Jack G.
  • 3,681
  • 4
  • 20
  • 24
  • 1
    Sorry, but your answer does not help. If I would be able to use a database, then I would use a database. Unfortunately I can't. I'm sure I'm not alone with this problem - Microsoft wouldn't have developed TxF if there was no need for it. Unfortunately I cannot use TxF since the solution should work on Windows versions older than Vista. – kol Dec 10 '12 at 15:03
  • But you did mention considering using MS Jet Database Engine, which is a step in the direction of using a database. As far as TxF is concerned, "Microsoft is considering deprecating TxF APIs in a future version of Windows", not a good prospect - http://msdn.microsoft.com/en-us/library/windows/desktop/hh802690(v=vs.85).aspx – Jack G. Dec 10 '12 at 18:33
  • @JGonzalez Yes, but Jet is a special embedded DBMS, which is able to treat directories as databases and files as tables. Regaridng TxF, I've already mentioned its future deprecation (along with that link) in a comment above. Thank you for your help anyway. – kol Dec 10 '12 at 19:24
  • Could you explain why you can't use a database? – Josir Dec 14 '12 at 21:11
  • @Josir Because the files are part of a larger system I cannot modify. I would use TxF, but my programs should run on Windows 2000 and XP, too. Another option would be the Jet Engine, but it can only handle delimited files, and I also have ini and xml files. – kol Dec 16 '12 at 14:35
1

See a related post Accessing a single file with multiple threads However my opinion is to use a database like Raven DB for these kind of transactions, Raven DB supports concurrent access to same file as well as supporting batching on multiple operations into a single request. However everything is persisted as JSON documents, not text files. It does support .NET/C# very well, including Javascript and HTML but not Delphi.

Community
  • 1
  • 1
mashtheweb
  • 167
  • 9
1

First of all this question has nothing to do with C# or Delphi. You have to simulate your file structure as if it is a database.

Assumptions;

  • Moving of files is a cheap process and Op System guarantees that the files are not corrupted during move.

  • You have a single directory of files that need to be processed. (d:\filesDB*.*)

  • A Controller application is a must.

Simplified Worker Process;

-initialization

  1. Gets a processID from the Operating system.
  2. Creates directories in d:\filesDB

    d:\filesDB\<processID>
    d:\filesDB\<processID>\inBox
    d:\filesDB\<processID>\outBox
    

-process for each file

  1. Select file to process.
  2. Move it to the "inBox" Directory (ensures single access to file)
  3. Open file
  4. Create new file in "outBox" and close it properly
  5. Delete file in "inBox" Directory.
  6. Move newly created file located in "OutBox" back to d:\filesDB

-finallization

  1. remove the created directories.

Controller Application

Runs only on startup of the system, and initializes applications that will do the work.

  1. Scan d:\filesDB directory for subdirectories,
  2. For each subDirectory 2.1 if File exists in "inBox", move it to d:\filesDB and skip "outBox". 2.2 if File exists in "outBox", move it to d:\filesDB 2.3 delete the whole subDirectory.
  3. Start each worker process that need to be started.

I hope that this will solve your problem.

Ali Avcı
  • 870
  • 5
  • 8
  • 1
    "First of all this question has nothing to do with C# or Delphi." I agree. I think it also has nothing to do with Windows. This is a much more general problem. – kol Dec 10 '12 at 15:12
1

You are creating a nightmare for yourself trying to handle these transactions and states in your own code across multiple systems. This is why Larry Ellison (Oracle CEO) is a billionaire and most of us are not. If you absolutely must use files, then setup an Oracle or other database that supports LOB and CLOB objects. I store very large SVG files in such a table for my company so that we can add and render large maps to our systems without any code changes. The files can be pulled from the table and passed to your users in a buffer then returned to the database when they are done. Setup the appropriate security and record locking and your problem is solved.

Jeff D.
  • 328
  • 1
  • 7
  • _This is why Larry Ellison (Oracle CEO) is a billionaire and most of us are not._ This might be the way of @kol to become a billionaire :D – Ali Avcı Dec 17 '12 at 07:32
  • That was weird - I just tried to upvote this answer and then realized it was myself from 5 years ago - :-) – Jeff D. May 18 '17 at 21:30
0

Ok, you are dead - unless you can drop XP. Simple like that.

Since POST-XP Windows supports Transactional NTFS - though it is not exposed to .NET (natively - you can still use it). This allows one to roll back or commit changes on a NTFS file system, with a DTC even in coordination with a database. Pretty nice. XP, though - no way, not there.

Start at Any real-world, enterprise-grade experience with Transactional NTFS (TxF)? as a starter. The question there lists a lot of ressources to get you started on how to do it.

Note that this DOES have a performance overhead - obviously. It is not that bad, though, unless you need a SECOND transactional resource, as there is a very thin kernel level transaction coordinator there, transactions only get promoted to full DTC when a second ressource is added.

For a direct link - http://msdn.microsoft.com/en-us/magazine/cc163388.aspx has some nice information.

Community
  • 1
  • 1
TomTom
  • 61,059
  • 10
  • 88
  • 148
  • 1
    Nice try, but the target os were Windows 2000, XP and above. I also have that in mind, but it fails because of these target os. Its a nice answer, but not to this question – Sir Rufo Dec 08 '12 at 19:05
  • Then you are seriously dead. You can STILL simulate transactions there - use DTC, compensating resource manager, copy files off and then have the new versions copied over the old ones by the resource manager - that gives you transcriptional integrity. But it will be a LOT of work and it will be VERY dangerous - small error, and some changes happen outside of transactions. – TomTom Dec 08 '12 at 19:09
  • @TomTom Thank you. I know this is a lot of work. But why hasn't this problem been solved yet in a general way? Everyone uses files, and there are cases when losing the content of these files is unacceptable. Is it really the only solution to use a relational database in these cases? I can't believe it... I understand that TxF is a solution, but what about older Windows versions, Linux, Mac OS, iOS, Android etc.? I would think that there *must* be a general solution for safe file handling. – kol Dec 09 '12 at 15:54
  • Well, it HAS been solved sort of with transacitonal NTFS - that this is not exposed in .NET is sad, but can be worked around. That you still have very old OS is another point. There can not be a general solution - handling transactions properly in a non transactional store is - well - brutally hard. A CRM (Compensating Ressource Manager) is as good as it gets. Most programmers have no real idea what a Tranasactions "edge cases" are to start with. – TomTom Dec 09 '12 at 16:20
  • 1
    @TomTom "While TxF is a powerful set of APIs, there has been extremely limited developer interest in this API platform since Windows Vista primarily due to its complexity and various nuances which developers need to consider as part of application development. As a result, **Microsoft is considering deprecating TxF APIs** in a future version of Windows to focus development and maintenance efforts on other features and APIs which have more value to a larger majority of customers." Source: http://msdn.microsoft.com/en-us/library/windows/desktop/hh802690%28v=vs.85%29.aspx – kol Dec 09 '12 at 16:22
  • Says a lot about the developers, or? ;) – TomTom Dec 09 '12 at 16:23