C# Improve parallel performance of Sparse Matrix Row and Column Total Calculations

Question

I have a sparse matrix containing roughly 100 million non-zero elements:

// [Row][Column][Element]
public IDictionary<int, IDictionary<int, decimal>> MyMatrix { get; private set; }

Getting the sum of each row is very fast:

private void RowSum()
{
    var rowTotals = new ConcurrentDictionary<int, decimal>();

    Parallel.ForEach(MyMatrix, (row) =>
    {
         rowTotals.TryAdd(row.Key, row.Value.Sum(x => x.Value));
    });
}

Getting the sum of each column is much slower:

private void ColumnSum()
{
   var columnTotals = new ConcurrentDictionary<int, decimal>();

   Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, column.Value, 
                 (key, old) => old + column.Value);
        }
   });
}

To make column calculations faster, I could create a [Column][Row][Element] matrix, but that would double the RAM requirement. Is there any approach or data structure that would allow for the column calculations to be as fast as the row calculations, without doubling the ram?

You could use a second `Parallel.ForEach`. I don't know if it makes a huge difference but its worth a shot — a-ctor, Jun 24 '15 at 19:54
Gave that a try, unfortunately no luck. I also tried the partitioner, and saw negligible gains. The row calculation takes about 2 seconds whereas the column calculation takes roughly 14 seconds. Thanks — Bermy Dev, Jun 24 '15 at 20:15

StuartLC · Accepted Answer · 2015-06-24T20:47:02.233

What could be happening is that there is contention for the centralized ConcurrentDictionary. If this is the case, you could try the localInit overload of Parallel.ForEach, to give each Task batch its own local (and uncontended) Dictionary, which is then aggregated into the central dictionary at the end:

var columnTotals = new ConcurrentDictionary<int, decimal>();

Parallel.ForEach(MyMatrix,
    // Each Task gets own dictionary
    () => new Dictionary<int, decimal>(),
    (row, state, colTots) =>
    {
        foreach (var column in row.Value)
        {
            if (!colTots.ContainsKey(column.Key))
            {
                colTots[column.Key] = column.Value;
            }
            else
            {
                colTots[column.Key] += column.Value;
            }
        }
        return colTots;
    },
    colTots =>
    {
        // Aggregate the dictionaries
        foreach (var column in colTots)
        {
            columnTotals.AddOrUpdate(column.Key, column.Value, 
                (key, old) => old + column.Value);
        }
    });

Edit

Some timings (10M populated elements in a 100000 x 100000 space)

Your RowSum 425ms
Your ColumnSum 7774ms
localInit ColumnSum 3324ms

So still an order of magnitude slower than the row sums, but looks like a reasonable improvement.

(Was also bug in my Dictionary usage)

This is looking good so far, I will need to test it a bit more, but seems to have cut the column calculation time in half =) — Bermy Dev, Jun 24 '15 at 21:10
This works well, I figured it was a contention issue, but I did not know it could be handled so elegantly. Thanks! — Bermy Dev, Jun 24 '15 at 21:19
Your mileage will vary quite wildly, depending on the amount of column contention. I've put some code [up here](https://gist.github.com/nonnb/4e5605983f8c282be31d). If there is little or no column contention (i.e. very very sparse matrix, or very few rows) then your `ColumnSum` will outperform. — StuartLC, Jun 25 '15 at 06:09

score 1 · Answer 2 · answered Jun 24 '15 at 20:19

I think that

 Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, 0, (key, old) => old + column.Value);
        }
   });

should be

 Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, column.value, (key, old) => old + column.Value);
        }
   });

I think that you can make the performance more symmetrical (but not faster) by starting with a public IDictionary<Tuple<int, int>, decimal> MyMatrix { get; private set; }

Yes, sorry about that will edit the question. Yeah I tried the tuple before, but as you said the speed gets hit. — Bermy Dev, Jun 24 '15 at 20:27

Graffito · Answer 3 · 2015-06-24T21:05:50.610

If the difference between the highest and lowest column values is sufficiently low to create a simple int array, you may proceed as follow:

Identify the highest and lowest value to further create a correspondence array associating to each column value an index in the 2 arrays used for summing the value, i.e an array storing the column values and, in parallel, the Columns Totals.

Your code will look like:

private void ColumnSum()
{
   int highestKeyValue=int.MinValue;
   int lowestKeyValue =int.MaxValue;
   Parallel.ForEach(MyMatrix, (row) =>
   { // identify highest and lowest column value
     foreach (var column in row.Value) 
     {
       lowest =Math.Min(lowestKeyValue ,column.Key) ; 
       highest=Math.Max(highestKeyValue,column.Key) ; 
      }
   // Create correspondence array 
   int[] corrrespondence=new int[highestKeyValue-lowestKeyValue]; 
   for (int i=0;i<highest-lowest;i++) corrrespondence[i]=-1 ; 
   Parallel.ForEach(MyMatrix, (row) =>
   { // tag the key values found in matrix
     foreach (var column in row.Value) corrrespondence[column.Key-lowest]=0 ; 
   }
   int columnsCount=0 ;
   // compute the indexes to result arrays
   for (int i=0;i<highest-lowest;i++) 
     if (corrrespondence[i]>=0) corrrespondence[i]=columnsCount++  ; 
   // allocate and initialize results array
   int[] columnValues=new int[columnsCount]() ;
   int[] columnTotals=new int[columnsCount]() ;
   for (int i=0;i<columnsCount;i++) columnTotals[i]=0 ; 
   Parallel.ForEach(MyMatrix, (row) =>
   {
     foreach (var column in row.Value)
     {
       int j=correspondence[column.Key-lowest] ;
       // a lock on results[j] is required there to avoid concurrent update of results[j] 
       columnValues[j]=column.Key ;
       columnTotals[j]+=column.Value ;
     }
   }
 }

C# Improve parallel performance of Sparse Matrix Row and Column Total Calculations

3 Answers3

Linked