C# How do I prevent data loss from crashes during a long running query?

Question

I have the following code that takes about an hour to run through a few hundred thousand rows:

public void Recording(int rowindex)
    {
        using (OleDbCommand cmd = new OleDbCommand())
        {
            try
            {
                using (OleDbConnection connection = new OleDbConnection(Con))
                {
                    cmd.Connection = connection;
                    connection.Open();
                    using (OleDbTransaction Scope = connection.BeginTransaction(SD.IsolationLevel.ReadCommitted))
                    {
                        try
                        {
                            string Query = @"UPDATE [" + SetupAction.currentTable + "] set Description=@Description, Description_Department=@Description_Department, Accounts=@Accounts where ID=@ID";
                            cmd.Parameters.AddWithValue("@Description", VirtualTable.Rows[rowindex][4].ToString());
                            cmd.Parameters.AddWithValue("@Description_Department", VirtualTable.Rows[rowindex][18].ToString());
                            cmd.Parameters.AddWithValue("@Accounts", VirtualTable.Rows[rowindex][22].ToString());
                            cmd.Parameters.AddWithValue("@ID", VirtualTable.Rows[rowindex][0].ToString());
                            cmd.CommandText = Query;
                            cmd.Transaction = Scope;
                            cmd.ExecuteNonQuery();
                            Scope.Commit();
                        }
                        catch (OleDbException odex)
                        {
                            MessageBox.Show(odex.Message);
                            Scope.Rollback();
                        }
                    }
                }
            }
            catch (OleDbException ex)
            {
                MessageBox.Show("SQL: " + ex);
            }
        }
    }

It works as I expect it to, however today my program crashed while running the query (in a for loop where rowindex is the index of a datatable), the computer crashed, and when I rebooted the program, it said:

Multi-step OleDB operation generated errors: followed by my connection string.

What happened is that database is entirely uninteractable, even microsoft access's recovery methods can't seem to help out here.

I've read that this may be caused when the data structure of the database is altered from what it expected it to be. My question is, how do I prevent this, since I can't really detect whether my program stopped functioning all of a sudden.

There could be a way for me to restructure it somehow, maybe there's a function I don't know about. Perhaps it is sending something of an empty query when the crash happens, but I don't know how to stop it.

This code looks extremely inefficient. Not only is it repeatedly opening and closing the DB connection for every single query execution, but also recreating the command with all its parameters. The execution time would be drastically reduced if you open and maintain the connection and also create the command object once. Then just reassign the parameter values and re-execute. It also seems rather unnecessary to start and commit a transaction for one single query which updates one row of data. Such a simple operation will fail or succeed on its own without need for a transaction. — C Perkins, Aug 02 '17 at 14:05
Also, please be specific about "access's recovery methods". Either list the various steps you attempted or perhaps share a link to resources with such steps. — C Perkins, Aug 02 '17 at 14:35
@CPerkins So, 1. Open connection for the duration of the loop 2. Assign command object outside the loop, 3. Assign parameters within the loop. 4. Do away with my transaction commits? I always wondered if this had an impact on performance, but my professor suggested adding them to every query, like a try catch block. When I tried opening the corrupted file, microsoft began its file recovery function, that takes the data and adds corrupted cells as a new table, except it took forever on the dataset and seemed to crash on its own... Not sure if that's a local machine problem, or access's. — Aroueterra, Aug 03 '17 at 00:40
Don't just "do away with ...commits". If a transaction is started at all (i.e. `connection.BeginTransaction') then you **must** commit or rollback, otherwise the database will be in an inconsistent state and updates will not be saved. Your professor is teaching good habits and for any set of multiple operations or a query that affects many records, then a transaction ensures that it all gets committed together or nothing. But for a _single query_ that updates a _single row_ then a separate transaction is probably unnecessary since a failure does not need to roll back other records. — C Perkins, Aug 03 '17 at 00:51
Instead, ask yourself whether the entire should loop to be in one transaction--are the individual updates dependent or independent. In other words, if a single update fails, do you want the many thousands of other updates to be rolled back? If you do want this behavior, then you start the transaction outside the loop at the same time you create the command object, then you commit all changes at the end of the loop. If even one update fails, then you exit the loop immediately and rollback all changes. This would of course require running the entire loop over after you fix any problem. — C Perkins, Aug 03 '17 at 00:58
Sorry for the excessive comments about transactions, but I must qualify my statements a bit. First of all, regarding no need for transactions of a single-row update, this assumes that there are no Data Macros (or triggers as they would be called on other DB systems) that might make additional changes to the database. Using proper terms, updating a single row is already an atomic operation. You can't separate such an update into multiple parts... it either updates the single row or it doesn't. — C Perkins, Aug 03 '17 at 01:12

score 2 · Accepted Answer · answered Aug 02 '17 at 15:13

The Jet/ACE database engine already attempts to avoid corruption and to automatically recover from catastrophic events (lost connections, computer crashing). Transactions can further protected against inconsistent data by committing (or discarding) multiple operations altogether. But eventually there may be some coincidental system failure which could terminate an operation at some critical write position, thereby creating critical inconsistencies in the database file. Making regular and timely backups is part of an overall solution. For very long operations it might be worth making an automated copy of the entire database file prior to the operation.

Otherwise, an extreme alternative is to

Create a second intermediate database into which all data is first inserted. (Only needs to be done once.)
In this intermediate database, create linked tables to relevant tables in the permanent, working database.
Also in the intermediate database, create an indexed local table that mirrors the linked table structure into which data will be inserted. OR if the intermediate database and table already exist, clear the local table (i.e. delete all rows).
Have your current software insert into the local intermediate table.
Run a single query which then updates the linked table from the temporary table. Wrap that update in a transaction.
- Here's where the linked table has the benefit that it can be referenced in an SQL query just like any local table. You only have to explicitly open the intermediate data. In other words, just perform a simple query like UPDATE LocalTable INNER JOIN LinkedTable ON LocalTable.UpdateID = LinkedTable.ID SET LinkedTable.Data = LocalTable.Data

The benefit to this process is that the single query that updates one Access table from another can be very fast, possibly much faster than the multiple update operations in your code. This could reduce the likelihood that errors in your update code will negatively effect your database. This of course doesn't completely eliminate the random computer crash that can effect the database, but reducing the time that multiple connections and update queries are executed might make it less likely.

score 1 · Answer 2 · answered Aug 02 '17 at 08:34

1

I think your catch block is wrong, because if you get an exception other than OleDbException, you will not roll back the transaction

try
{
    // ...
    Scope.Commit();
}
catch (Exception ex)
{
    MessageBox.Show(ex.Message);
    Scope.Rollback();
}

That is, Exception instead of OleDbException. Exceptions could come from anywhere and not necessarily Ole DB, and you still want to roll back everything you've done so far in that case.

That being said, if you have a few hundred thousand rows, I would seriously consider batching the update, and processing just a few thousand per iteration with a transaction per iteration

In terms of transactional behavior, the main question would be: Do you really want to roll back everything you have updated so far in case of failure, or just retry/continue where you left off? If answer is that you want to retry/continue then you will likely want to create a BatchUpdateTask table or similar... with all the information you need for each iteration

answered Aug 02 '17 at 08:34

Andrés Robinet

1,527
12
18

Hmmm, I've always wondered when to deviate from Exception ex. I actually have that on every other try catch, but someone on SO said my catch exceptions have to be specific, in this case, it is OLeDB. Not sure if access has a function like that, but I'll try to look for one. – Aroueterra Aug 02 '17 at 08:48
Yet another way to troubleshoot the problem is to rely on [Access stored procedures](https://msdn.microsoft.com/en-us/library/office/ff845861.aspx) But apparently you need 2013 for that – Andrés Robinet Aug 02 '17 at 08:53
@AndrésRobinet According to online docs, the CREATE PROCEDURE command is supported starting with version 2010. (Within Access, these are presented as Queries with parameters.) But how would this help troubleshoot? – C Perkins Aug 02 '17 at 14:23
1

@CPerkins I imagined a situation where a table is updated from another after some manipulation. Maybe that's not the case here, but usually, materializing source data in memory to update thousands of records is a performance hog. Doing as much work as possible in the SQL engine is, more often than not, faster and safer. This has nothing to do with transactions themselves, but I can imagine a program crashing because of a long running transaction. And you can use procedures and triggers for that purpose https://stackoverflow.com/questions/3287545/how-do-i-make-a-stored-procedure-in-ms-access – Andrés Robinet Aug 03 '17 at 04:22
2

@AndrésRobinet Agreed. That is in fact the basis for my posted answer regarding an intermediate database table and update query, however I was assuming (naively) that the source data was not originally in an Access database (so the DB engine couldn't access data directly). If the VirtualTable object in the OP's code already has its source in an Access database, then there are definitely more efficient means to update the table rather that inserting it row-by-row from memory. In that case the entire approach could probably be re-engineered and perhaps stored procedures could be useful then. – C Perkins Aug 03 '17 at 05:26

C# How do I prevent data loss from crashes during a long running query?

2 Answers2