0

I'm processing a large excel (10k records) and it is a requirement that this process run on multiple threads to improve performance.

Right now i'm doing a check if row <= 2000 then that's fine run Utils.IxGenerateWithData with all records. But if row > 2000 (e.g. 10k) I want to split these into multiple threads that process Utils.IxGenerateWithData with 2000 records each.

Please help

using (Stream contentStream = await requestContent.ReadAsStreamAsync())
                            {
                                Workbook workbook = new Workbook(contentStream);
                                Worksheet worksheet = workbook.Worksheets[0];
                                int column = 0; // first column
                                Cell lastCell = worksheet.Cells.EndCellInColumn((short)column);

                                //Run on multiple threads if the file has more than 2000 records
                                if (lastCell.Row > 2000)
                                {

                                    //Not sure what to do here



                                    // Infiniti GenerateWithData Web Service
                                    Thread thread = new Thread(() => Utils.IxGenerateWithData(payloadSettings.ProjectGUID, payloadSettings.DatasourceGUID, xmlContent, payloadSettings.InfinitiUsername, payloadSettings.InfinitiPassword, payloadSettings.ServiceWSDL));
                                    thread.Start();
                                }
                                else
                                {
                                    for (int row = 0; row <= lastCell.Row; row++)
                                    {
                                        Cell cell = worksheet.Cells.GetCell(row, column);
                                        xmlContent += cell.StringValueWithoutFormat;
                                    }

                                    // Infiniti GenerateWithData Web Service
                                    Utils.IxGenerateWithData(payloadSettings.ProjectGUID, payloadSettings.DatasourceGUID, xmlContent, payloadSettings.InfinitiUsername, payloadSettings.InfinitiPassword, payloadSettings.ServiceWSDL);
                                }                               
                            }

2 Answers2

0

A good start would be to determine how many threads you want to start. If you are going on 2000 rows per thread, then threadCount would be calculated as follows:

var threadCount = (lastCell.Row / 2000) + 1;

The 1 is added to ensure that the thread will never have more than 2000 rows but it can have less.

Then calculate rowsPerThread as follows:

var rowsPerThread = lastCell.Row / threadCount;

Lastly have a for loop to start the threads passing it the array of rows that it should process. Here I would create a class that are created in the for loop, and the rows it need to process is passed through in the constructor. Then have a Start method that starts a thread to process the rows in the object.

An outline of such a class would look as follows:

public class ExcelRowProcessor()
{
    private List<ExcelRow> _rows = new List<ExcelRow>();
    public ExcelRowProcessor(IEnumerable<ExcelRow> rows)
    {
        _rows.AddRange(rows);
    }

    public void Start()
    {
        // Start the thread here.
    }
}

I hope this helps.

Jaco B
  • 994
  • 7
  • 16
  • after Determine thread count and Calculate rows per thread. I have this for loop for(int threadNum = 0; threadNum <= threadCount; threadNum++ ) { List cells = new List(); cells.AddRange(); } What I don't understand is how can I make sure the first time of the loop it will run from 1 to 2000 and the second time it will run from 2001 to etc... – harpermartin Dec 07 '16 at 06:34
0

Sorry to make this a new answer, but I don't have the reputation yet to be able to post on Jaco's yet.

Anyways, in general you don't want to determine the number of threads based on workload/bucketsize. It is better to determine the bucket size based on the number of CPU Cores. This is in order to prevent thread switching, also allowing one core for the OS/Virus scanner can help as well.

To get thread/core/process count ... see this post: How to find the Number of CPU Cores via .NET/C#?

var threadCount = cpuCoreCount - 1; //TODO: use code from above URL
if (0 == threadCount) {
   threadCount = 1;
}
var rowsPerThread = lastCell.Row / threadCount;  // As Jaco posted

So back to your question about how to thread:

using (Stream contentStream = await requestContent.ReadAsStreamAsync())
{
    Workbook workbook = new Workbook(contentStream);
    Worksheet worksheet = workbook.Worksheets[0];
    int column = 0; // first column
    Cell lastCell = worksheet.Cells.EndCellInColumn((short)column);
    List<IAsyncResult> asyncResults = new List<IAsyncResult>();
    string xmlContent = ""; // assuming this is local


    for (int row = 0; row <= lastCell.Row; row++)
    {
        Cell cell = worksheet.Cells.GetCell(row, column);
        xmlContent += cell.StringValueWithoutFormat;

        if (((row > 0) && (row % rowsPerThread == 0)) || (rows == lastCell.Row))
        {
            var caller = new GenerateDelegate(Generate);
            asyncResults.Add(caller.BeginInvoke(xmlContent, null, null));
            xmlContent = "";
        }
    }

    // Wait for the threads
    asyncResults.ForEach(result => {
       while(result.IsCompleted == false) {
           Thread.Sleep(250);
       }
   });
}

Place this code outside the function

private delegate void GenerateDelegate(string xmlContent);

///<summary>
/// Call Infiniti GenerateWithData Web Service
///<summary>
private void Generate(string xmlContent) 
{
    Utils.IxGenerateWithData(payloadSettings.ProjectGUID, payloadSettings.DatasourceGUID, xmlContent, payloadSettings.InfinitiUsername, payloadSettings.InfinitiPassword, payloadSettings.ServiceWSDL);
}
Community
  • 1
  • 1
DevNull
  • 131
  • 7