SQL EXISTS, JOIN or SELECT on collection of objects

Question

I am trying to optimize Quartz .NET scheduler by eliminating the foreach iteration through each jobs.

It is a question related to SQL, and not in .NET.

The code iterates through each jobs to perform operations and, for each jobs, does the following:

bool existingJob = await JobExists(conn, newJob.Key, cancellationToken).ConfigureAwait(false);
try
{
    if (existingJob)
    {
        if (!replaceExisting)
        {
            throw new ObjectAlreadyExistsException(newJob);
        }
        await Delegate.UpdateJobDetail(conn, newJob, cancellationToken).ConfigureAwait(false);
    }
    else
    {
        await Delegate.InsertJobDetail(conn, newJob, cancellationToken).ConfigureAwait(false);
    }
}
catch (IOException e)
{
    throw new JobPersistenceException("Couldn't store job: " + e.Message, e);
}
catch (Exception e)
{
    throw new JobPersistenceException("Couldn't store job: " + e.Message, e);
}

Where in every await lies an sql request.

I would like to do the opposite: do a massive JobExists for all jobs which would then tell me which jobs exists and which don't, and then update the jobs that exists and add those who don't.

So, for example 200 000 jobs, instead of doing 200000 times exists, then add, or update, we would have 3 sql transactions, one which would validate which one exists, and then a transaction to add in bulk and a last one to update in bulk.

But I do not know how to do the massive Exists in SQL, I only know how to do an IF EXISTS (SELECT A FROM B) query for one. Is such a thing possible? Or should I do a massive SELECT, or a JOIN of some sort? How could I proceed?

* EDIT *

As I have progressed the code now runs, I think, in a stored procedure like below:

USE [Quartz]
GO
/****** Object:  StoredProcedure [dbo].[procProcessJobs]    Script Date: 7/31/2018 10:46:09 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[procProcessJobs] 
-- Add the parameters for the stored procedure here
@Jobs JobsTableType READONLY
AS

DECLARE MYCURSOR CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR Select * FROM @Jobs

Declare @Sched NVARCHAR(120), @JobName NVARCHAR(150), @JobGroup NVARCHAR(150),
@Description NVARCHAR(250), @JobClass NVARCHAR(250), @IsDurable BIT, @IsNonConcurrent BIT,
@IsUpdateData BIT, @RequestsRecovery BIT, @JobData BIT

OPEN MYCURSOR

FETCH NEXT FROM MYCURSOR 
INTO @Sched, @JobName, @JobGroup, @Description, @JobClass, @IsDurable, @IsNonConcurrent, @IsUpdateData, @RequestsRecovery, @JobData

WHILE @@FETCH_STATUS = 0
BEGIN
    If EXISTS(SELECT * FROM QRTZ_JOB_DETAILS WHERE SCHED_NAME = @Sched AND JOB_NAME = @JobName AND JOB_GROUP = @JobGroup)
    BEGIN
        /*do your update code here*/
        UPDATE QRTZ_JOB_DETAILS 
            SET DESCRIPTION = @Description,
                JOB_CLASS_NAME = @JobClass, 
                IS_DURABLE = @IsDurable, 
                IS_NONCONCURRENT = @IsNonConcurrent, 
                IS_UPDATE_DATA = @IsUpdateData, 
                REQUESTS_RECOVERY = @RequestsRecovery, 
                JOB_DATA = @JobData  
            WHERE SCHED_NAME = @Sched AND JOB_NAME = @JobName AND JOB_GROUP = @JobGroup
    END
    ELSE BEGIN
        /*do your insert code here*/
        INSERT INTO QRTZ_JOB_DETAILS (SCHED_NAME, JOB_NAME, JOB_GROUP, DESCRIPTION, JOB_CLASS_NAME, IS_DURABLE, IS_NONCONCURRENT, IS_UPDATE_DATA, REQUESTS_RECOVERY, JOB_DATA)  
        VALUES(@Sched, @JobName, @JobGroup, @Description, @JobClass, @IsDurable, @IsNonConcurrent, @IsUpdateData, @RequestsRecovery, @JobData)
    END

END

CLOSE MYCURSOR
DEALLOCATE MYCURSOR

However the code takes a long time running and I don't know why...

See, I'm thinking something like a request that would give me only the rows that exists could be enough, because then I'd have 2 collections in .NET and I could figure those that does not exists and do both operations separately. — hsim, Jul 30 '18 at 15:51

Ibrahim Malluf · Accepted Answer · 2018-07-30T17:58:15.773

2

When you have Rowset operations move them to the data engine where the performance is best.

Pass the list of newJobs in a Table parameter to a stored procedure, let the processing happen there then return a Rowset that indicates the result of each newJob processed. Then process the results as needed in the client. the performance boost should be immense.

That's how I would try to do it, given the information provided.

Here is a sample stored procedure that would do what you want. It is not tested and your mileage might vary:

/* Create a table type. */  
CREATE TYPE JobsTableType AS TABLE   
( JobKey INT  
, Value1 INT
, Value2 INT);  
GO  

CREATE PROCEDURE procProcessJobs 
-- Add the parameters for the stored procedure here
@Jobs JobsTableType READONLY
AS

DECLARE MYCURSOR CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR Select Distinct JobKey FROM @Jobs

Declare @MyKey Int

OPEN MYCURSOR

FETCH NEXT FROM MY_CURSOR INTO @MyKey

WHILE @@FETCH_STATUS = 0
BEGIN
    If EXISTS(Select * From [your_table_name] where      [your_table_name].JobKey=@MyKey)
    BEGIN
        /*do your update code here*/
    END
    ELSE BEGIN
        /*do your insert code here*/
    END

END

CLOSE MYCURSOR
DEALLOCATE MYCURSOR

GO

As an added bonus :) here is some untested C# code:

namespace ConsoleApp1
{

    public class Job
    {
    public int  Key { get; set; }
    public int value1 { get; set; }
    public int value2 { get; set; }
}


class Program
{

    static DataTable getJobTabel(List<Job> jobs)
    {
        DataTable results = new DataTable();
        results.Columns.Add("JobKkey", typeof(int));
        results.Columns.Add("Value1", typeof(int));
        results.Columns.Add("Value2", typeof(int));


        foreach(Job item in jobs)
        {
            object[] r = { item.Key, item.value1, item.value2, false };
            results.Rows.Add(new object[] { item.Key, item.value1, item.value2});
        }

        return results;
    }


    static void Main(string[] args)
    {
        List<Job> myJobs = new List<Job>;//populate the myjobs list

        DataTable JobsToProcess = getJobTabel(myJobs);

        //create your connection and command objects then add JobsToProcess as a parameter

    }

}

}

edited Jul 30 '18 at 17:58

answered Jul 30 '18 at 16:00

Ibrahim Malluf

657
4
6

That's interesting. That's something I would like to do as well. Would you care to add examples please? – hsim Jul 30 '18 at 16:04
Here is a link showing how to pass table parameters to a stored procedure:https://stackoverflow.com/questions/5595353/how-to-pass-table-value-parameters-to-stored-procedure-from-net-code – Ibrahim Malluf Jul 30 '18 at 16:13
Ok. So I understand that you would store all the jobs in a temporary table, then pass the table to a stored procedure. What would the procedure look like? What would it do? A select on a the data, or another command? – hsim Jul 30 '18 at 16:25
I have progressed and added a lot of code up there, but I am stuck with a declaration of cursor or usage. Could you take a look? – hsim Jul 30 '18 at 20:15
Don't know what you are doing there? Do you have direct access to the SQL-Server where you can create the Stored Proc or are you trying to inject the code through an ORM ?? – Ibrahim Malluf Jul 30 '18 at 21:44
You also have to create the TableVariable. Look at the top of my Stored Proc: /* Create a table type. */ CREATE TYPE JobsTableType AS TABLE ( JobKey INT , Value1 INT , Value2 INT); GO – Ibrahim Malluf Jul 30 '18 at 21:57
Oh. I don't know how to use a cursor like what we are doing right now. My idea was that I had to create a temporary table in which I'd store the data, then declare a cursor (if I understood well) which would iterate through the data to execute the rest of the script? – hsim Jul 31 '18 at 13:05
But I think I understand what you are trying to tell me. – hsim Jul 31 '18 at 13:07
Edited based on your comments – hsim Jul 31 '18 at 14:49

score 0 · Answer 2 · answered Jul 31 '18 at 17:02

Here's how I proceeded based on @Ibrahim's answer:

Ran this script on SQL Server Management Studio

/* Create a table type. */  
CREATE TYPE JobsTableType AS TABLE   
( JobKey INT  
, Value1 INT
, Value2 INT);  
GO  

CREATE PROCEDURE procProcessJobs 
-- Add the parameters for the stored procedure here
@Jobs JobsTableType READONLY
AS

BEGIN
    MERGE [dbo].[QRTZ_JOB_DETAILS] AS T
    USING @Jobs AS S
    ON T.SCHED_NAME = S.SCHED_NAME AND T.JOB_NAME = S.JOB_NAME AND T.JOB_GROUP = S.JOB_GROUP
    WHEN MATCHED THEN UPDATE 
                    SET T.DESCRIPTION = S.DESCRIPTION,
                        T.JOB_CLASS_NAME = S.JOB_CLASS_NAME, 
                        T.IS_DURABLE = S.IS_DURABLE, 
                        T.IS_NONCONCURRENT = S.IS_NONCONCURRENT, 
                        T.IS_UPDATE_DATA = S.IS_UPDATE_DATA, 
                        T.REQUESTS_RECOVERY = S.REQUESTS_RECOVERY, 
                        T.JOB_DATA = S.JOB_DATA  
    WHEN NOT MATCHED THEN 
        INSERT (SCHED_NAME, JOB_NAME, JOB_GROUP, DESCRIPTION, JOB_CLASS_NAME, IS_DURABLE, IS_NONCONCURRENT, IS_UPDATE_DATA, REQUESTS_RECOVERY, JOB_DATA)
        VALUES(S.SCHED_NAME, S.JOB_NAME, S.JOB_GROUP, S.DESCRIPTION, S.JOB_CLASS_NAME, S.IS_DURABLE, S.IS_NONCONCURRENT, S.IS_UPDATE_DATA, S.REQUESTS_RECOVERY, S.JOB_DATA);

END

And how I do it with c# code:

DataTable tempTable = GetJobTable(jobs);

using (var cmd = PrepareCommand(conn, string.Empty))
{
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.CommandText = "dbo.procProcessJobs";
    SqlParameter parameter = new SqlParameter
    {
        ParameterName = "@Jobs",
        SqlDbType = SqlDbType.Structured,
        Value = tempTable
    };
    cmd.CommandTimeout = 12000;

    cmd.Parameters.Add(parameter);

    return await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}


private DataTable GetJobTable(List<IJobDetail> jobs)
{
    DataTable results = new DataTable();
    results.Columns.Add(ColumnSchedulerName, typeof(string));
    results.Columns.Add(ColumnJobName, typeof(string));
    results.Columns.Add(ColumnJobGroup, typeof(string));
    results.Columns.Add(ColumnDescription, typeof(string));
    results.Columns.Add(ColumnJobClass, typeof(string));
    results.Columns.Add(ColumnIsDurable, typeof(bool));
    results.Columns.Add(ColumnIsNonConcurrent, typeof(bool));
    results.Columns.Add(ColumnIsUpdateData, typeof(bool));
    results.Columns.Add(ColumnRequestsRecovery, typeof(bool));
    results.Columns.Add(ColumnJobDataMap, typeof(byte[]));

    foreach (var job in jobs)
    {
        byte[] baos = null;
        if (job.JobDataMap.Count > 0)
        {
            baos = SerializeJobData(job.JobDataMap);
        }

        results.Rows.Add(new object[]
        {
            SchedulerNameLiteral,
            job.Key.Name,
            job.Key.Group,
            job.Description,
            GetStorableJobTypeName(job.JobType),
            GetDbBooleanValue(job.Durable),
            GetDbBooleanValue(job.ConcurrentExecutionDisallowed),
            GetDbBooleanValue(job.PersistJobDataAfterExecution),
            GetDbBooleanValue(job.RequestsRecovery),
            baos
        });
    }

    return results;
}

SQL EXISTS, JOIN or SELECT on collection of objects

2 Answers2