SQLite alternate SELECT followed by insert for every record in a set of 1 miilion records

Question

I am making an application in which data is transferred to a database which consist of 2 tables, and for the main table I need to check whether the record I am trying to insert already exists or not.

If the record is found , its Id is used for storing along with the record to store in the second table. If the record is not found in the first table, a new entry is made and the id for the new record is stored along with data in second table

I am having issue while executing select before insert every time to get the existing ID if the record exists.

If i Index the column to be searched , it decreases the speed of my insertion exponentially. after a hundred thousand records.
If i do not index the column to be searched it still takes a hell lot of time.
Incase for direct insertion without select , the time taken to insert all records were 10 secs - 12 secs.
So i need help with how can i increase the speed of my task

So far i have refered these links , but this just helped me to increase the speed of inserts but slows down when i use select to Improve INSERT-per-second performance of SQLite?

I thought of using INSERT OR IGNORE to check if a record exists, and if doesn't it will insert it to database, but even in that case to get Id of the record i need to do a select

CODE USED :

int TransferUrls(int numberOfEntries, PCHAR dbName, PCHAR urlList)
{
sqlite3      *dbHandle              = NULL; 
int          returnCode             = 0;
int          mainTableEntryNumber   = 0;
int          processedUrlId         = 0;
int          returnCode1            = 0;
sqlite3_stmt *selectIdStatement     = NULL;
sqlite3_stmt *insertProcessedUrl    = NULL;
sqlite3_stmt *insertRawUrl          = NULL;
int          numberOfEntriesRead    = 0;
clock_t      cStartClock;
WCHAR        rawUrl[MAX_URL_LENGTH] = L"";
CHAR         processedUrl[MAX_URL_LENGTH] = "";
CHAR         url[MAX_URL_LENGTH] = "";
int          status = 0;
FILE         *fp = NULL;
errno_t      err;

memset(rawUrl, '\0', sizeof(rawUrl));
memset(processedUrl, '\0', sizeof(processedUrl));   
memset(url , '\0', sizeof(url));

//open the file from which we are goin to read the links
err = fopen_s(&fp, urlList, "r");
if(err != 0)
{
    printf("Error opening the file to transfer urls to databse\n");
    return TRANSFER_ERROR;
}

//open a connection to the database in which we would have to enter
returnCode = sqlite3_open(dbName, &dbHandle);

if(returnCode != SQLITE_OK)
{
    printf("Error occured while opening database \n");
    return TRANSFER_ERROR;
}

//Get the idNo from the MainTable, used for some calculation , ignore it
if(getLastIdNumber(dbHandle, mainTableEntryNumber) == ERROR)
{
    fclose(fp);
    sqlite3_close(dbHandle);
    return TRANSFER_ERROR;
}       

//Create a prepared statement to get entry number for an processed url
returnCode = sqlite3_prepare_v2(dbHandle, "SELECT idNo FROM urlDetails  where processedUrl = ?;", -1, &selectIdStatement, NULL);
if(returnCode != SQLITE_OK)
{
    printf("Error occured while preparing select statement for getting idNo\n");
    printf("Error msg : %s \n", sqlite3_errmsg(dbHandle));
    fclose(fp);
    sqlite3_close(dbHandle);
    return TRANSFER_ERROR;
}

//create prepared statememnt to insert entry in main table for the processed url
returnCode = sqlite3_prepare_v2(dbHandle, "INSERT INTO urlDetails(idNo, processedUrl, insertTime) VALUES(?, ?, datetime('now','unixepoch', 'localtime'));", -1, &insertProcessedUrl, NULL);
if(returnCode != SQLITE_OK)
{
    printf("Error occured wile preparing insert statement to insert values for processed url\n");
    printf("Error msg: %s\n", sqlite3_errmsg(dbHandle));
    sqlite3_finalize(selectIdStatement);
    fclose(fp);
    sqlite3_close(dbHandle);
    return TRANSFER_ERROR;
}

returnCode = sqlite3_prepare_v2(dbHandle, "INSERT INTO urlData VALUES(?, ?);", -1, &insertRawUrl, NULL);
if(returnCode != SQLITE_OK)
{
    printf("Error occured wile preparing insert statement to insert values for raw url\n");
    printf("Error msg: %s\n", sqlite3_errmsg(dbHandle));
    sqlite3_finalize(selectIdStatement);
    sqlite3_finalize(insertProcessedUrl);
    fclose(fp);
    sqlite3_close(dbHandle);
    return TRANSFER_ERROR;
}


//make synchronous mode off
returnCode = sqlite3_exec(dbHandle, "PRAGMA synchronous = OFF;", NULL, NULL,NULL);
if(returnCode != SQLITE_OK)
{
    printf("Error occured while making pragma synchronous mode off\n");
    status = TRANSFER_ERROR;
    goto EXIT;  
}

//make journal_mode to memory
returnCode = sqlite3_exec(dbHandle, "PRAGMA journal_mode = OFF;", NULL, NULL,NULL);
if(returnCode != SQLITE_OK)
{
    printf("Error occured while making pragma jorunal mode to memory\n");
    status = TRANSFER_ERROR;
    goto EXIT;
}

returnCode = sqlite3_exec(dbHandle,"BEGIN TRANSACTION;", NULL , NULL , NULL);
if(returnCode != SQLITE_OK)
{
    printf("Error occured while making pragma synchronous mode off\n");
    status = TRANSFER_ERROR;
    goto EXIT;
}   

// get the number of seconds since the program was run
cStartClock = clock();

int entryNumber = 0;
//Read data from a table which consists of raw url and insert the data in both the tables urlData and urlDetails based on conditions
while(fgets(url, MAX_URL_LENGTH, fp) != NULL)
{
    entryNumber++;

    //check for the number of entries
    if(entryNumber == numberOfEntries)
    {
        break;              
    }
    else
    {
        memset(rawUrl, '\0', sizeof(rawUrl));

        //Replace the trailing newline character in string with string terminating character
        size_t len = strlen(url);

        if (len > 0 && url[len-1] == '\n') 
        {
            url[--len] = '\0';
        }

        //printf("Entry %d : %s\n",entryNumber,  url);

        numberOfEntriesRead++;  

        //printf("Url is  %s \n", url);

        //getting the name for the rawUrl
        if(swprintf_s(rawUrl, MAX_URL_LENGTH, L"http://%S", url) == -1)
        {
            printf("Error occured while preparing raw url for %s \n", url);
            continue;
        }
        //here we will perform the further operation i.e give a task to the user i.e get the processed url
        if(getUrlDetails(rawUrl, processedUrl) == GET_URL_ERROR)
        {
            printf("Error occured while gettting processed url from %s raw url\n", rawUrl);
            continue;
        }

        //1. check for the url in the database 
        //2. If present then store its id number and make an entry for the raw url
        //3. if not present then store the id number for the proceesed url ,and make entry for both processed and raw url
        returnCode = sqlite3_bind_text(selectIdStatement, 1, processedUrl, -1, NULL);
        if(returnCode != SQLITE_OK)
        {
            printf("Error occcured while binding for select Id statement for url %s\n", processedUrl);
            printf("Error msg : %s \n",sqlite3_errmsg(dbHandle));
            status = TRANSFER_ERROR;
            break;
        }

        returnCode = sqlite3_step(selectIdStatement);
        if(returnCode == SQLITE_ROW)
        {
            //this indicates that the processed url is already present in the database
            processedUrlId = sqlite3_column_int(selectIdStatement, 0);

        }
        else if (returnCode == SQLITE_DONE)
        {
            mainTableEntryNumber++;

            processedUrlId = mainTableEntryNumber;

            //make an entry for the processed url in the table
            //bind values to insert statement to set the idNumber
            returnCode = sqlite3_bind_int(insertProcessedUrl, 1, processedUrlId);
            if(returnCode != SQLITE_OK)
            {
                printf("Error occured while binding select statement for insert statement \n");
                status = TRANSFER_ERROR;
                break;
            }

            //bind the name of the proceesed url to the insert statement for making a new entry in Main table
            returnCode = sqlite3_bind_text(insertProcessedUrl, 2, processedUrl, -1, NULL);
            if(returnCode != SQLITE_OK)
            {
                printf("Error occured while binding select statement for insert statement \n");
                status = TRANSFER_ERROR;
                break;
            }

            //execute the insert statement
            returnCode = sqlite3_step(insertProcessedUrl);
            if(returnCode != SQLITE_DONE)
            {
                printf("Error occured while insert entry into the new table \n");
                printf("Error msg: %s\n", sqlite3_errmsg(dbHandle));
                status = TRANSFER_ERROR;
                break;
            }

            sqlite3_reset(insertProcessedUrl);
        }
        else if (returnCode == SQLITE_ERROR)
        {
            printf("Error occured while fetching data for processed url %s\n", processedUrl);
            status = TRANSFER_ERROR;
            break;
        }

        //make an entry for the raw url into the other table
        sqlite3_bind_text(insertRawUrl, 1, url,-1, NULL); 
        sqlite3_bind_int(insertRawUrl, 2, processedUrlId);

        returnCode = sqlite3_step(insertRawUrl);
        if(returnCode != SQLITE_DONE)
        {
            printf("Error occured while inserting an entry for the raw url \n");
            printf("Error msg : %s\n", sqlite3_errmsg(dbHandle));
            status = TRANSFER_ERROR;
            break;
        }
        sqlite3_reset(insertRawUrl);
        sqlite3_reset(selectIdStatement);
    }   

    memset(url, '\0', sizeof(url));
}

//Calculate the number of seconds required for the program to complete
printf("%d Entries were processed in %4.6f seconds\n", numberOfEntriesRead, (clock() - cStartClock)/(double)CLOCKS_PER_SEC);

sqlite3_exec(dbHandle,"END TRANSACTION;", NULL , NULL , NULL);



    //free the memory allocated for the select statement as we are done with it
EXIT :  sqlite3_finalize(selectIdStatement);
        sqlite3_finalize(insertProcessedUrl);
        sqlite3_finalize(insertRawUrl);

        //Close the handle to the database connection
        sqlite3_close(dbHandle);

        //close the handle to file
        fclose(fp);

        return status;
}

@CL. I have tried the method in which i have made use of prepared statements for querying instead of sqlite3_exec . I have kept the journal mode in memory and synchronous mode off. Also i have kept the column to be searched as unique — ash123, Mar 23 '15 at 08:32
@CL. My requirement is to check for an entry in the database, basically it will be a unique string, if the record already exists, i will use it id to be stored in some other table for some analysis purpose . so basically what i do is , i use the select query to get if of the record, if no row is returned then , i insert the new entry and use the id while time of insertion, else i use the id i got from the row returned by the select query . so the problem here is the 1 million select statements are slowing down by work a lot, can u help me with an alternative for it — ash123, Mar 23 '15 at 08:49
@CL. It was all being done in a single transaction , still the issue was there — ash123, Mar 23 '15 at 09:52
Then show the actual code you're using, and the time you're measuring for a single query. — CL., Mar 23 '15 at 09:53
Not sure whether my suggestion [here](http://stackoverflow.com/questions/11790595/sqlite-inner-join-update-using-values-from-another-table/22481731#22481731) is of any use (particularly with 1M records): the idea is that all the updates happen as a result of triggers, so you don't have to select-before-insert. This would be followed by an insert-or-ignore on what's left. — TripeHound, Mar 23 '15 at 10:56
@TripeHound i will have a look at it . thank you, will let you know, if it works for me — ash123, Mar 23 '15 at 11:43
@CL. I have posted the code , can u have a look at it and help me out — ash123, Mar 23 '15 at 11:53
upto hundred thousand records everything works fine , but as the number of records increase, i guess the select statement is slowing down the performance — ash123, Mar 23 '15 at 13:44
How do you measure it? Do you measure overall function timing, or do you measure each loop iteration? You should also measure each `sqlite3_step` call, like @CL suggestes. That's how you learn the real cause of the problem. — Googie, Mar 25 '15 at 11:16
@Googie i tried doing both, and i concluded that as the number of records increase , time taken to select a value also increases. — ash123, Mar 25 '15 at 11:36
@Googie The main aim behind this is to achieve normalization , but this is actually reducing the speed a lot, up to hundred thousand records everything is fine but as i approach for records above 5 hundred thousand, the speed decreases to insert a value — ash123, Mar 25 '15 at 11:38
@Googie I have tried analying my program once again , and i came to know that the select statement was not actually taking time, i increased my cache size to 50, 000 , so it showed a slight performance improvement , how should i configure my database accordingly, any idea — ash123, Mar 30 '15 at 10:28
@Googie the one we use in sqlite, i changed the default cache size, nut a new issue has arise , as i have kept the column a UNIQUE constraint , as the number of inserts increase, the speed goes on decreasing gradually — ash123, Mar 31 '15 at 09:46
Can you also paste a DDL (a definition) of tables `urlDetails` and `urlData`? Are there any indexes created for those tables? — Googie, Mar 31 '15 at 12:09
@Googie yes sure, and yes both the table have a UNIQUE constraint on their column processedUrl and rawUrl, — ash123, Mar 31 '15 at 15:49
@Googie schema for both the tables is urlData(idNo int, rawUrl VARCHAR(3000) UNIQUE); urlDetails(idNo int , processedUrl varchar(3000) UNIQUE, checkedUrl int, domainCheck int, serviceCheck int, pageInfo int, insertTime TEXT, checkTime TEXT); — ash123, Mar 31 '15 at 15:50
Try removing unique constraints and instead create a separate index for each of those 2 columns, making the index unique. See if it makes any difference. — Googie, Apr 01 '15 at 07:37
@Googie sure, will check that and let you know about the results — ash123, Apr 01 '15 at 07:40

SQLite alternate SELECT followed by insert for every record in a set of 1 miilion records

0 Answers0