2

I have a code that reads a lot of files in a loop from one location on disk and copy each file to another location. for each file copied, it also inserts a record (that contains the filename and other info) in a CopiedFiles table in a sqlserver db (using Entity Framework). The destination path must include a folder whose name include the id of the inserted CopiedFile (assigned by SqlServer at the moment of inserting), so I have to first insert into db, then copy the file in disk.

I have the File.Copy() inside a try/catch block, so if there is any problem when copying the file, it will just continue with next file. Of course, this creates a problem: in some cases, the input file can not be copied (for example corrupted file that throws an cyclic redundancy error when trying to copy it). In these cases, the db record has been inserted in the db but the file wont exist in the destination path in disk.

A simple solution would be to just delete the inserted record in db when the exception is thrown. But I would like to keep this code so that it only does inserts in db and not deletes...

So I was thinking another solution could be, before inserting the row in db, checking the file to make sure I will be able to copy it without errors. Now I am thinking possible ways to check this...

  • one would be to actually copy the file to a temporary location, then if this goes ok, delete it from this temporary location, then saving row in db, then copy original file to destination. But of course, this would make the process twice as long (and the files to be copied are usually a lot, several thousands).

Could there be any other (more efficient) ways to check this?

patsy2k
  • 471
  • 2
  • 8
  • 2
    you can insert in a transaction and commit it when file copy is successful, and otherwise rollback – siggemannen Jul 05 '23 at 14:41
  • 2
    Alternatively to @siggemannen's suggestion, you could insert a record that the copy was started, and then update the record with the result of the operation. – Matthew Jul 05 '23 at 14:43
  • Next alternative change the id to a guid or something like that instead of a identity field so that it can be precreated. Then first copy the file and create the db entry afterwards on successful copying. – Ralf Jul 05 '23 at 14:46
  • I do think it's simpler to just delete on failure though =) one doesn't want to lock the db for too long – siggemannen Jul 05 '23 at 14:52
  • @siggemannen Thanks, that's the correct answer indeed. Not sure why, I had the idea that you couldn't manually do transactions in EntityFramwork (only that db.SaveChanges() itself was wrapped by EF in a transaction). After reading your comment, I have search on this and its what I have finally done. All the answers here are really helpful and informative, but yours has been really the better one, so if you make your comment as an answer I will choose it as the correct one. – patsy2k Jul 05 '23 at 15:40
  • @patsy2k , great, i've added the answer as well as a little caveat emptor which i feel i must mention :) – siggemannen Jul 05 '23 at 15:49

4 Answers4

3

for example corrupted file that throws an cyclic redundancy error when trying to copy it

The standard file copying functionality doesn't do any CRC checks on copied data. That alone leads me to believe that file systems and streams in general are an unknown subject to you, so I will tailor my answer accordingly.

one would be to actually copy the file to a temporary location, then if this goes ok, delete it from this temporary location, then saving row in db, then copy original file to destination

Even if it didn't take longer, that doesn't guarantee anything. For example, if your actual destination file is opened by a program in a non-share mode, the temporary copy will go and the actual copy won't.

No, your system of inserting rows in your db before the copy happens is just broken. Insert your rows only after the file copies successfully. No record deletion necessary.

Also, you should store the failed files in an array somewhere and retry them after a few seconds, the #1 cause of file copying failures in my experience (besides files open in external editors) is antivirus applications keeping them open on demand, and you need to retry after they're done. Of course, if it fails a few times then feel free to give up.

The best you can do to check if a file might copy successfully is to just open the destination for writing. Since you don't want to lose its contents on a successful open (presumably?), you can open it in append mode. But this is only a heuristic, nothing but actually copying the file will tell you if the copy will succeed or not.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • "The standard file copying functionality doesn't do any CRC checks" the disk subsystem does though, and all manner of other things may go wrong in the middle as well. – Charlieface Jul 05 '23 at 15:08
  • The disk subsystem also fixes its own problems of CRC communication failure as part of the DMA protocol. We're talking about an abstraction layer multiple times above it, and short of a broken HDD or RAM sticks you'll never interact with that in your application. But yes, you are correct, there are redundancy across all hardware communication. – Blindy Jul 05 '23 at 15:12
3

So I was thinking another solution could be, before inserting the row in db, checking the file to make sure I will be able to copy it without errors.

Nope. Don't check if the file will copy. There are all kinds of problems with that plan.* Wait to insert to the DB until afterwards and instead first check if the file has copied. Something more like this:

try
{
    CopyFile();
    if (ValidateCopiedFile())
    {
       InsertDBRecord();
    }
    else 
    {
       LogError()
    }
}
catch (Exception ex) 
{
    LogError()
}

We could also perhaps simplify this by making the validate method throw instead of returning false:

try
{
    CopyFile();
    ValidateCopiedFile();    
    InsertDBRecord();
}
catch (Exception ex) 
{
    LogError()
}

But of course this opens the concern of using exception handling for logical program flow. It's only the correct way if failing validation really is, well, exceptional.


* While the linked text speaks in terms of file existence and opening, the logic applies to any use of the file system, including copying.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • You might want to do a `try { DeleteCopiedFile(); } catch {}` inside the `catch` in order to cleanup in the event of an exception. – Charlieface Jul 05 '23 at 15:07
3

You can do the database insert in a transaction and commit it when file copy is successful, and otherwise rollback. Something like this pseudo-code:

try{
 beginTransaction();
 String newName = addFileToDb(fileName);
 copyFile(fileName, newName);
 commitTransaction();
}
catch (Exception ex)
{
 rollbackTransaction();
}

But there's a bit of a caveat i must mention, file copy operation might take some time, and while it's performed, you might be locking the table participating in the transaction which might affect other operations. So, it might be more prudent to actually delete the rows from the database instead of transacting.

siggemannen
  • 3,884
  • 2
  • 6
  • 24
2

You wont know if there are any issues until you try the copy, the main issues you will come up against are things like space, permissions and valid paths etc.

You could check the space, permissions, paths etc but these are all expensive operations.. however with reasonable assumptions on space and likely permissions etc (or basic validations) you could also try and copy a "known" small/test file to check if that works (you only need to do this for different locations periodically) and then copy larger files on that basis that works?

Apart from basic validation I would probably try all the operations and log / rollback any changes etc if there are failures.

Mark Redman
  • 24,079
  • 20
  • 92
  • 147