Using parallel tasks to load flat files into database with a separate process (sql loader.dll)

Question

I am wondering if using parallel processing will be advantageous in the following scenario:

I have written a basic console application that is intended to run on Windows Task Scheduler. Its main goal in life is to process n number of flat files using a utility called SQL Loader. The DBA, whose idea it was to use SQL Loader and encapsulate as much business logic in Oracle as possible, and use .NET simply for calling out to the utility, wants the ability to schedule/run multiple copies of this console app to make it faster (some of the files may be rather large and thus may take some time to process).

Instead of running "multiple copies", I thought of running separate threads to share the workload, where the number of threads can be configured by an admin (they want as much control over this app as possible).

I am relatively new to .NET/C# with no experience in this, so I am trying to think of an elegant way to share the work load and I came across the Task Parallel Library.

Here is the main function that gets a list of ids (representing files) to load using the sqlldr.dll

public static void FindFilesToLoad(int monitorId)
    {
        string connectionString = string.Empty;
        string tableTypeName = string.Empty;
        int dxmtId = 0;

        connectionString = Helper.GetPlxConnectionString();
        using (OracleConnection connection = new OracleConnection(connectionString))
        {
            using (OracleCommand cmd = new OracleCommand("OraclePackage.getWorkToDo", CommandType.StoredProcedure))
            {
                cmd.AddInputParameter<int>("P_tpmID", OracleDbType.Int16, monitorId);
                OracleParameter p_dxmtId = cmd.AddOutputParameter<OracleParameter>("O_dxmtID", OracleDbType.Table, null);
                p_dxmtId.ObjectTypeName = "Work_Table";

                cmd.Connection = connection;
                connection.Open();
                cmd.ExecuteNonQuery();

                OracleTable dxmtIds = (OracleTable)p_dxmtId.Value;
                if (dxmtIds.Count; != 0)
                {
                    for (int i = 0; i < dxmtIds.Count;; i++)
                    {
                        try
                        {
                            dxmtId = Convert.ToInt32(dxmtIds[i]);

                            LoadFileIntoDatabase(monitorId, dxmtId);
                        }
                        catch (Exception e)
                        {
                            SetFileLoadStatus(FileLoadStatus.Error, monitorId, dxmtId, e.Message);
                        }
                    }
                }
                else
                {
                    LogMessage(monitorId, "FindFilesToLoad", "No file records to process.");
                }
            }
        }
    }

Here is the file loading part:

public static void LoadFileIntoDatabase(int monitorId,  int fileId)
    {
        int exitCode = 0;
        string myCommand = string.Empty;
        string controlFileWithPath = string.Empty;
        string controlFileName = string.Empty;
        string errorMsg = string.Empty;
        string parFileWithPath = string.Empty;
        string timeStamp = string.Empty;

        try
        {
            //create files for sql loader
            timeStamp = DateTime.Now.ToString("yyMMddHHmmss");
            controlFileWithPath = CreateSqlLoaderControlFile(monitorId, timeStamp);
            controlFileName = Path.GetFileName(controlFileWithPath);
            parFileWithPath = CreateParFile(monitorId, controlFileName, timeStamp);

            myCommand = @"CMD.EXE";
            ProcessStartInfo startInfo = new ProcessStartInfo(myCommand)
            {
                WorkingDirectory = ConfigurationManager.AppSettings["ExportPath"].ToString(),
                Arguments = @"/c SQLLDR CONTROL=" + controlFileWithPath + " PARFILE=" + parFileWithPath,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                UseShellExecute = false
            };

            Process process = new Process();
            process.StartInfo = startInfo;
            process.Start();
            process.WaitForExit();
            exitCode = process.ExitCode;

            //0 is success, otherwise fail
            if (exitCode == 0)
            {
                File.Delete(controlFileWithPath);
                File.Delete(parFileWithPath);
            }
            else
            {
                errorMsg = process.StandardError.ReadToEnd();
                SetFileLoadStatus(FileLoadStatus.Error, monitorId, fileId, errorMsg);
                SetLoadStatusToError(monitorId, fileId, errorMsg);
            }
        }
        catch (Exception e)
        {
            SetLoadStatusToError(monitorId, fileId, e.Message);
            SetFileLoadStatus(FileLoadStatus.Error, monitorId, fileId, e.Message);
        }
    }

Would something as simple as this be effective for making the file processing faster (and would this be preferable to using tasks in windows task scheduler)?:

static void Main()
{
    //get number of parallel tasks from db
    int tasks = GetTaskCount();
    for(i=0; i<tasks.count; i++)
    {
        Parallel.Invoke(() =>
                    {
                        FindFilesToLoad();
                    };
    }

    LogMessage("Returned from Parallel.Invoke");

}

Any advice/tips is much appreciated.

`Parallel.For` is probably more suited to your last code block. It will encapsulate your entire `for` loop and block until all are done. I dont see the issue with spinning multiple threads for this process invoker. You could also see https://stackoverflow.com/questions/470256/process-waitforexit-asynchronously for a non-blocking way of waiting for the process to finish if you're looking at that level of optimization. — jamespconnor, Jan 12 '16 at 21:25
Thank you. I tried your suggestion, which initially exited me because I noticed a 2x speed increase when testing with 1000 iterations where I was simply creating/writing-to/deleting files. However, when I tried to run the sql loader process per iteration, it is throwing "file not found" errors. Obviously this is due to the fact that my understanding of multi-threading is um...limited. How can I isolate things in my loop so the files that are created are available when the sql loader process needs them? Or-and this may be a more important question-should I be using a different approach? — inspectorGadget, Jan 13 '16 at 22:06

Using parallel tasks to load flat files into database with a separate process (sql loader.dll)

0 Answers0