4

I'm writing a program that uses a sqlite3 database file to store its data. If I open a database file with

sqlite3_open_v2(filename, &db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL)

the database file is created if it does not exist. How can I find out if the database file existed prior to opening it? The sqlite3 shell uses code like this:

  /* Go ahead and open the database file if it already exists.  If the
  ** file does not exist, delay opening it.  This prevents empty database
  ** files from being created if a user mistypes the database name argument
  ** to the sqlite command-line tool.
  */
  if( access(data.zDbFilename, 0)==0 ){
    open_db(&data, 0);
  }

This code however has a race condition when the database file is created by another process after the access call and before the open_db call (time-of-check vs. time-of-use).

Another answer (that I can't find right now) suggests to check the application_id and user_version fields. If they are zero, a database was just created. I researched this approach and found that numerous applications actually don't bother to set these fields in newly created databases, so this approach is fuzzy at best and I don't think it solves my problem.

Is there a method to find out if the database existed prior to opening it that does not involve a race condition like this? It is also acceptable if I can merely find out if the database file was initialized (as in, a truncated file was populated with a sqlite3 header) by sqlite3.

The purpose of having such a routine is to be able to find out if I need to create all the tables I need in the database. I don't want to accidentally overwrite another file placed there by a different application.

Sample code

A simplified illustration of the problem can be found in the following bash script. It simulates two applications. They both work in similar ways:

  1. Create or open a database test.db
  2. If the database did not exist before, create a table test and write a single row into it. The value is 1 for the first application and 2 for the second.
  3. Print the contents of the database.

Here is the code:

#!/bin/sh

# sleeps are inserted to make the race conditions easier to trigger

application() {
    echo Checking if database exists...
    [ ! -f test.db ]
    status=$?
    sleep 2

    if (exit $status)
    then
        echo Database not found, making tables
        sqlite3 test.db "CREATE TABLE IF NOT EXISTS test (a);"
        sleep 2
        echo Writing initial record into database
        sqlite3 test.db "INSERT INTO test VALUES ($1);"
        sleep 2
    else
        echo Database found, checking if it belongs to me
    fi

    echo Printing content of database
    sqlite3 test.db "SELECT * FROM test;"
}

rm -f test.db
echo First test: app1 and app1 race
application 1 & (sleep 1 ; application 1)

rm -f test.db
echo Second test: app2 and app1 race
application 2 & (sleep 1 ; application 1)

My goal is to ensure that the following cases can never happen:

  • An instance of application one opens the database file, concludes that it isn't initialized and initializes it (by creating the table and writing the initial record) even though it already contains data from a different instance of the same application or from a different application.
  • An database file belonging to a different application is written into.

If application was programmed correctly, each run would only initialize the database if it wasn't initialized before. Thus, the only output you would see is either a database containing only the row 1 or a database that contains only the row 2.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • I have hard time understanding what the problem is. Why not just open the file and create tables. If tables exist, surely the create statement will fail? – user694733 Feb 22 '16 at 13:27
  • @user694733 What if I open a database from a different application that (by coincidence) uses the same table names I use? Then my application mysteriously fails later on or the database might get corrupted from me trying to write nonsensical stuff into it. What if two instances of my program are opened at the same time and try to initialize the same tables twice, writing duplicate initialization records or worse? – fuz Feb 22 '16 at 13:32
  • I had the same problem some time ago. At the end I developed such as _service daemon_ that unique access and manage things on the DB through a **UDP** socket connection. – LPs Feb 22 '16 at 13:33
  • It's been a while since I have used SQLite, but it should use exclusive locking for write operations. You could use transactions for create/init step to prevent 2 instances problem. For other app problem, why would you share the db directory with other unrelated application in the first place? However, if there is reason to do that, and it also uses SQLite, then locking should take care of that too. – user694733 Feb 22 '16 at 13:41
  • @user694733 There is no application directory. The database files are created where the user wants them. Think of it as a [file format](http://sqlite.org/appfileformat.html), not an internal application database. Locking is of course used, but the situation is not necessarily a race-condition where two processes access the same database at the same time. It's a [TOCTTOU](https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use) problem. – fuz Feb 22 '16 at 13:49
  • As page you linked says, preventing this seems very hard. Especially if you have no control over the other app that might overwrite file at any time. I guess the responsibility falls on the user. Anyway, I think question might too difficult to answer in it's current form. – user694733 Feb 22 '16 at 14:10
  • @user694733 If it wasn't reasonably difficult to answer, I wouldn't have asked the question as I would have found a solution on my own. – fuz Feb 22 '16 at 14:11
  • I meant that there are too many unknown variables at the moment. You need to provide some concrete example, MCVE. – user694733 Feb 22 '16 at 14:16
  • @user694733 It's hard to make an MCVE as there isn't any source code that is wrong. This question is not seeking debugging help. What are you exactly missing that I could add to the question? – fuz Feb 22 '16 at 14:19
  • As I commented, as far as it is applicable for you, think to centralize actions on DB. Other solution e.g. could be to create table with same name with a specific patter: Process_Name + PID + table name. In other wards makes all things unique. Then you have to join all tables to retrieve info on selects and so on. – LPs Feb 22 '16 at 14:24
  • @LPs This is not solving the problem I have either. Please try to stick to the original question instead of trying to solve a different problem. Please take as it is: I must make sure that I do not under any circumstances write into database files created by other applications. I must make sure that I do recognize a database file was created by myself (i.e. the application itself, but possibly a different process) and do not initialize it again. This must be free of race conditions. – fuz Feb 22 '16 at 14:26
  • Example of worst case situation, that should still be handled gracefully. Give boundary to question. Race conditions are hard to reproduce, but if nothing else, 2 artificial, minimal programs that reproduce the problem. Even if they have to use non-realistic delays to make problem evident. – user694733 Feb 22 '16 at 14:29
  • @user694733 I have given the two conditions I need to avoid in the last paragraph of my question and in my previous comment. If you need to have a concrete program that generates these two cases, well, okay. I can make that, even though I don't see what it adds. – fuz Feb 22 '16 at 14:31
  • My last comment, You are asking for a way to avoid that a different process could create a DB between a _check if exist_ and _create_ actions. If you write a DB manager, all process HAVE TO ask to that process to open the DB. Then all actions will be serialized and no race conditions will cause problems. AFAIK there are no other ways embed into sqlite3. – LPs Feb 22 '16 at 14:39
  • @LPs A db manager is not practical. Think of the database as a file format. My program is a small utlitiy program that creates and modifies these files. There might be thousands of such files on a given system. I cannot guarantee that a global manager process is running either. Well, I could, but then nobody would use my software as it would require a service to be installed on the system which usually requires elevated permissions and comes with a set of other problems. – fuz Feb 22 '16 at 14:41
  • @LPs also, a db manager would also not catch the case where someone else (possibly a different application) created the database with their own data and my program opens it. How is the database manager going to know about all databases on the system? It can't, these databases could have been downloaded from the internet or loaded from a USB flash drive that was just plugged in. – fuz Feb 22 '16 at 14:43
  • **You cannot cover all those cases.** What if I wrote an app that `rc = sqlite3_open("file::memory:", &db);` and save all data on exit, maybe 10 hours later? In the meanwhile your app added a tons of records to the FS same name DB that will be lost when that app saves its RAM records.... – LPs Feb 22 '16 at 14:49
  • @LPs I cannot control if other applications overwrite my database files and that is not the scope of this question. I do want to prevent this scenario with my own application, but that is an entirely different question which can be answered by opening the database file right in the beginning (possibly without writing anything in there) and locking it or by developing a strategy if someone else uses the file name you want to use in the meanwhile. This also requires the problem in this question to be solved, but it is a different problem. Please attempt to stick to this question. – fuz Feb 22 '16 at 14:53

1 Answers1

2

It is not possible to differentiate a newly-created database from an empty database that was created earlier.

However, an empty database is the only one with this problem. Any other database can be detected by checking if the sqlite_master table is not empty. Do this and your table creation inside a transaction, and there is no race condition.

If your first write to the database (which is when the file is actually created) is setting the application_id, then you know that any file with another ID is not yours. (Registered application IDs are unique.)

CL.
  • 173,858
  • 17
  • 217
  • 259