3

I'm trying to implement neural network and deep learning code in C#. Sample code in my text book is written in Python, so I'm trying to convert them to C#.

My question is that calculating dot product with numpy is extremely faster than my C# code written from scratch.

While my numpy code takes a few second to calculate dot product 1000 times, my C# code takes much longer than it.

Here is my question. How can I make my C# code faster?

Here is numpy code:

C:\temp>more dot.py
from datetime import datetime

import numpy as np

W = np.random.randn(784, 100)
x = np.random.randn(100, 784)

print(datetime.now().strftime("%Y/%m/%d %H:%M:%S"))

for i in range(0,1000):
    np.dot(x, W)

print(datetime.now().strftime("%Y/%m/%d %H:%M:%S"))

C:\temp>\Python35\python.exe dot.py
2017/02/08 00:49:14
2017/02/08 00:49:16
C:\temp>

And this is C# code:

public static double[,] dot(double[,] a, double[,] b)
{
    double[,] dot = new double[a0, b1];

    for (int i = 0; i < a.GetLength(0); i++)
    {
        for (int j = 0; j < b.GetLength(1); j++)
        {
            // the next loop looks way slow according to the profiler
            for (int k = 0; k < b.GetLength(0); k++)
                dot[i, j] += a[i, k] * b[k, j];
        }
    }
    return dot;
}

static void Main(string[] args)
{
    // compatible function with np.random.randn()
    double[,] W = random_randn(784, 100);
    double[,] x = random_randn(100, 784);

    Console.WriteLine(DateTime.Now.ToString("F"));
    for (int i = 0; i < 1000; i++)
        dot(W, x);
    Console.WriteLine(DateTime.Now.ToString("F"));
}

Regards,

snaga
  • 41
  • 1
  • 7
  • Why are you implementing neural networks from scratch? If it's a learning exercise then it does not matter much how fast the code runs. If it's to get stuff working well, then use an already-written high quality software. There are many packages with neural network models, like TensorFlow, H2O, Torch. They are all much better engineered with more features and higher speed than what just one person can make with C#. – Geoffrey Anderson Feb 07 '17 at 16:23
  • Right. It's just for my learning for both C# and deep learning, but I found that calculating dot products look too slow than I expected, and I felt painful to run the examples (ported to C#) in my textbook. So, I would like to improve the performance. And I'm going to use some existing libraries for my future production systems in terms of performance and better implementation. – snaga Feb 08 '17 at 16:51

4 Answers4

2

Numpy is extremly optimized by using BLAS. You will probably not get such a good performance using your own code.

The dot product is though very well parallelizable. You could look into working multi-threaded, but to be honest it's not worth the effort. Just look for a library that implements the dot product for you and use that!

skjerns
  • 1,905
  • 1
  • 16
  • 25
2

Your code is doing matrix multiplication. There are fast algorithms for doing matrix multiplication and what you're doing is very slow O(n^3) [technically O(n*m^2) based on column/row length]. Plus you allocate the memory each time which isn't a good idea.

Resources for you:

Incidentally if you want the state of the art in desktop performance for this type of thing you might want to look into CUDA: https://en.wikipedia.org/wiki/CUDA

keith
  • 5,122
  • 3
  • 21
  • 50
  • Thanks! I'm new to matrix operations and its optimization, so I'm going to learn it. CUDA technology as well. – snaga Feb 08 '17 at 17:01
1

Make your C# code be like python code: Know when your language can't keep up with the big dogs, and when that happens, call out to the native code in the resident BLAS subsystem for high performance parallel native optimized matrix math ops.

The resident BLAS subsystem is wrapped by a standard API. Your C# code will call the API, but will not know -- not knowing is a good thing! -- which particular BLAS subsystem is currently installed on the host.

I like OpenBLAS. Other people like Intel MKL(?). Still others like ATLAS. I hate ATLAS.

Geoffrey Anderson
  • 1,534
  • 17
  • 25
  • > Know when your language can't keep up with the big dogs, and when that happens, call out to the native code in the resident BLAS subsystem for high performance parallel native optimized matrix math ops. Yeah, actually, that's exactly what I wanted to learn, and I think it's time to learn BLAS and how to call it from C#. Thanks! – snaga Feb 08 '17 at 16:53
1

If you need practical solution - use existing libraries.

If you are doing this for entertainment/educational purposes:

  • Eliminate all function calls from the innermost loop (GetLength) - any function calls can't be cached and result in significant slow down. Outer loops may benefit from same optimization, but will not give significant benefits.

  • Try to transpose second matrix first so inner loop accesses sequential elements for both arrays.

  • Try to use arrays of arrays instead of 2d array.

  • When using arras of arrays try to use Length in inner loop - which may eliminate bounds checks on at least one array
  • Try to parallelize outermost loop with Parallel.Foreach
  • If actual problem calls for more than one multiplication of non-square matrices - https://en.wikipedia.org/wiki/Matrix_chain_multiplication

Also use Stopwatch to measure time - Exact time measurement for performance testing

Community
  • 1
  • 1
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • Thanks for great tips! Yeah, I'm new to C# and neural network/deep learning, so it is my toy project for me to master both C# and neural network/deep learning algorithms. I'm looking for performance tips to implement numeric algorithms in C#. I would like to try your tips asap. Thanks again! – snaga Feb 08 '17 at 16:58