2

I have a sparse matrix containing roughly 100 million non-zero elements:

// [Row][Column][Element]
public IDictionary<int, IDictionary<int, decimal>> MyMatrix { get; private set; }

Getting the sum of each row is very fast:

private void RowSum()
{
    var rowTotals = new ConcurrentDictionary<int, decimal>();

    Parallel.ForEach(MyMatrix, (row) =>
    {
         rowTotals.TryAdd(row.Key, row.Value.Sum(x => x.Value));
    });
}

Getting the sum of each column is much slower:

private void ColumnSum()
{
   var columnTotals = new ConcurrentDictionary<int, decimal>();

   Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, column.Value, 
                 (key, old) => old + column.Value);
        }
   });
}

To make column calculations faster, I could create a [Column][Row][Element] matrix, but that would double the RAM requirement. Is there any approach or data structure that would allow for the column calculations to be as fast as the row calculations, without doubling the ram?

StuartLC
  • 104,537
  • 17
  • 209
  • 285
Bermy Dev
  • 33
  • 6
  • You could use a second `Parallel.ForEach`. I don't know if it makes a huge difference but its worth a shot – a-ctor Jun 24 '15 at 19:54
  • Gave that a try, unfortunately no luck. I also tried the partitioner, and saw negligible gains. The row calculation takes about 2 seconds whereas the column calculation takes roughly 14 seconds. Thanks – Bermy Dev Jun 24 '15 at 20:15

3 Answers3

4

What could be happening is that there is contention for the centralized ConcurrentDictionary. If this is the case, you could try the localInit overload of Parallel.ForEach, to give each Task batch its own local (and uncontended) Dictionary, which is then aggregated into the central dictionary at the end:

var columnTotals = new ConcurrentDictionary<int, decimal>();

Parallel.ForEach(MyMatrix,
    // Each Task gets own dictionary
    () => new Dictionary<int, decimal>(),
    (row, state, colTots) =>
    {
        foreach (var column in row.Value)
        {
            if (!colTots.ContainsKey(column.Key))
            {
                colTots[column.Key] = column.Value;
            }
            else
            {
                colTots[column.Key] += column.Value;
            }
        }
        return colTots;
    },
    colTots =>
    {
        // Aggregate the dictionaries
        foreach (var column in colTots)
        {
            columnTotals.AddOrUpdate(column.Key, column.Value, 
                (key, old) => old + column.Value);
        }
    });

Edit

Some timings (10M populated elements in a 100000 x 100000 space)

  • Your RowSum 425ms
  • Your ColumnSum 7774ms
  • localInit ColumnSum 3324ms

So still an order of magnitude slower than the row sums, but looks like a reasonable improvement.

(Was also bug in my Dictionary usage)

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • This is looking good so far, I will need to test it a bit more, but seems to have cut the column calculation time in half =) – Bermy Dev Jun 24 '15 at 21:10
  • This works well, I figured it was a contention issue, but I did not know it could be handled so elegantly. Thanks! – Bermy Dev Jun 24 '15 at 21:19
  • Your mileage will vary quite wildly, depending on the amount of column contention. I've put some code [up here](https://gist.github.com/nonnb/4e5605983f8c282be31d). If there is little or no column contention (i.e. very very sparse matrix, or very few rows) then your `ColumnSum` will outperform. – StuartLC Jun 25 '15 at 06:09
1

I think that

 Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, 0, (key, old) => old + column.Value);
        }
   });

should be

 Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, column.value, (key, old) => old + column.Value);
        }
   });

I think that you can make the performance more symmetrical (but not faster) by starting with a public IDictionary<Tuple<int, int>, decimal> MyMatrix { get; private set; }

RayCW
  • 174
  • 2
  • 10
  • Yes, sorry about that will edit the question. Yeah I tried the tuple before, but as you said the speed gets hit. – Bermy Dev Jun 24 '15 at 20:27
0

If the difference between the highest and lowest column values is sufficiently low to create a simple int array, you may proceed as follow:

Identify the highest and lowest value to further create a correspondence array associating to each column value an index in the 2 arrays used for summing the value, i.e an array storing the column values and, in parallel, the Columns Totals.

Your code will look like:

private void ColumnSum()
{
   int highestKeyValue=int.MinValue;
   int lowestKeyValue =int.MaxValue;
   Parallel.ForEach(MyMatrix, (row) =>
   { // identify highest and lowest column value
     foreach (var column in row.Value) 
     {
       lowest =Math.Min(lowestKeyValue ,column.Key) ; 
       highest=Math.Max(highestKeyValue,column.Key) ; 
      }
   // Create correspondence array 
   int[] corrrespondence=new int[highestKeyValue-lowestKeyValue]; 
   for (int i=0;i<highest-lowest;i++) corrrespondence[i]=-1 ; 
   Parallel.ForEach(MyMatrix, (row) =>
   { // tag the key values found in matrix
     foreach (var column in row.Value) corrrespondence[column.Key-lowest]=0 ; 
   }
   int columnsCount=0 ;
   // compute the indexes to result arrays
   for (int i=0;i<highest-lowest;i++) 
     if (corrrespondence[i]>=0) corrrespondence[i]=columnsCount++  ; 
   // allocate and initialize results array
   int[] columnValues=new int[columnsCount]() ;
   int[] columnTotals=new int[columnsCount]() ;
   for (int i=0;i<columnsCount;i++) columnTotals[i]=0 ; 
   Parallel.ForEach(MyMatrix, (row) =>
   {
     foreach (var column in row.Value)
     {
       int j=correspondence[column.Key-lowest] ;
       // a lock on results[j] is required there to avoid concurrent update of results[j] 
       columnValues[j]=column.Key ;
       columnTotals[j]+=column.Value ;
     }
   }
 }
Graffito
  • 1,658
  • 1
  • 11
  • 10