Populate new field in SQLite database using existing field value and a C# function

Question

I have a SQLite database for which I want to populate a new field based on an existing one. I want to derive the new field value using a C# function.

In pseudocode, it would be something like:

foreach ( record in the SQLite database)
{
my_new_field[record_num] = my_C#_function(existing_field_value[record_num]);
}

Having looked at other suggestions on StackOverflow, I'm using a SqliteDataReader to read each record, and then running a SQlite "UPDATE" command based on the specific RowId to update the new field value for the same record.

It works .... but it's REALLY slow and thrashes the hard drive like crazy. Is there really no better way to do this?

Some of the databases I need to update might be millions of records.

Thanks in advance for any help.

Edit:

In response to the comment, here's some real code in a legacy language called Concordance CPL. The important point to note is that you can read and write changes to the current record in one go:

int db;

cycle(db)
{
db->FIRSTFIELD = myfunction(db->SECONDFIELD);
}

myfunction(text input)
{
text output;

/// code in here to derive output from input

return output;
}

I have a feeling there's no equivalent way to do this in SQLite as SQL is inherently transactional, whereas Concordance allowed you to traverse and update the database sequentially.

... it looks like I should be able to get a performance increase by wrapping all of the updates in a single transaction. However, I think my question still stands as to whether the multiple update queries is the correct architecture for this at all. Thanks. — Andrew17856, Jul 25 '19 at 00:07
There may be any number of other ways, it is hard to tell from the fake pseudo code. — Ňɏssa Pøngjǣrdenlarp, Jul 25 '19 at 00:32
It's hard to tell why your previous code was so slow because your pseudo-pseudo-code was not useful (despite adding more abstract CPL). It revealed nothing of the actual C# objects or SQL statements... details that are **critical** to asking for help with performance, etc. Because sqlite is an in-process, embedded DBMS, it allows [custom functions](https://stackoverflow.com/questions/2108870/how-to-create-custom-functions-in-sqlite) to be defined so that you could do something like `UPDATE mytable SET FIRSTFIELD = myfunction(SECONDFIELD)`... very similar to the CPL snippet. — C Perkins, Jul 25 '19 at 08:06
Thanks. OK. I probably should have been more specific with my question. I wasn't having difficulty with the coding. My query was more sense-checking the principle: i.e. whether the *only* way to update all rows in a SQLite database is by multiple UPDATE commands referencing the different RowIDs. It seems the answer is "yes", since even if you use a SQLDataAdapter is still just creating UPDATE commands behind the scene. — Andrew17856, Jul 25 '19 at 08:34
p.s. I'm also aware of the ability to define custom functions in SQLite. I'm not aware that there's an interface that allows SQLite to use C# .net functions though (which is the brief here)? — Andrew17856, Jul 25 '19 at 09:14
https://stackoverflow.com/questions/172735/create-use-user-defined-functions-in-system-data-sqlite Sorry, I'm not able to dredge up my use of them right now with a more complete code sample, but here's a start for C# using system.data.sqlite (which can easily be added to a Visual Studio project using nuget). — C Perkins, Jul 25 '19 at 14:35
That's really helpful - thank you. That's another way to do it then. Create the c# function, bind it into SQLite, then execute an update command that calls back to it. — Andrew17856, Jul 25 '19 at 15:51

Andrew17856 · Accepted Answer · 2019-07-27T05:25:34.370

0

The answer to this is to wrap all of the updates into a single transaction.

There is an example here that does it for bulk inserts: https://www.jokecamp.com/blog/make-your-sqlite-bulk-inserts-very-fast-in-c/

In my case, it would be bulk updates based on RowID wrapped into a single transaction.

It's now working, and performance is many orders of magnitude better.

EDIT: per the helpful comment above, defining a custom C# function and then reference it in a single UPDATE command also works well, and in some ways is better than the above as you don't have to loop through within C# itself. See e.g. Create/Use User-defined functions in System.Data.SQLite?

edited Jul 27 '19 at 05:25

answered Jul 25 '19 at 05:55

Andrew17856

161
13

The fact that this helped to improve your code, great. But even the code in the linked page is still incomplete and does not use best practices. You could probably speed it up even more. The point of defining a SQLiteCommand object is to create a parameterized **prepared** (i.e. similar to pre-compiled statement) statement that you then only pass in the data and re-execute. The linked code is redefining the `cmd.CommandText` repeatedly... not efficient. Look into SQLiteCommand parameters. Simply wrapping in a transaction was likely not what gained the orders of magnitude in performance. – C Perkins Jul 25 '19 at 07:58
1

Thanks. Yes, I'm using parameters in my code. The command is defined in the "using" statement, and only the parameters are updated as it loops through each record. – Andrew17856 Jul 25 '19 at 08:41

Populate new field in SQLite database using existing field value and a C# function

1 Answers1