I am working on a project where I have to loop through large arrays (lists), accessing each item by index. Usually this involves checking each element against a condition, and thereafter potentially updating its value.
I've noticed that this is extremely slow compared to, for example, doing a similar thing in C#. Here is a sample of simply looping through arrays and reassigning each value:
C#:
var a = new double[10000000];
var watch = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < a.Length; i++)
{
a[i] = 1.0;
}
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
//About 40ms
Python:
a = []
for i in range(0, 10000000):
a.append(0.0)
t1 = time.clock()
for i in range(0, 10000000):
a[i] = 1.0
t2 = time.clock()
totalTime = t2-t1
//About 900ms
The python code here seems to be over 20x slower. I am relatively new to python so I cannot judge if this kind of performance is "normal", or if I am doing something horribly wrong here. I am running Anaconda as my python environment, PyCharm is my IDE.
Note: I have tried using nditer
on numpy arrays with no significant performance increase.
Many thanks in advance for any tips!
UPDATE: I've just compared the following two approaches:
#timeit: 43ms
a = np.random.randn(1000,1000)
a[a < 0] = 100.0
#timeit: 1650ms
a = np.random.randn(1000,1000)
for x in np.nditer(a, op_flags=['readwrite']):
if (x < 0):
x[...] = 100.0
looks like the first (vectorized) approach is the way to go here...