26

Below is a simple piece of process coded in C# and Python respectively (for those of you curious about the process, it's the solution for Problem No. 5 of Project Euler).

My question is, the C# code below takes only 9 seconds to iterate, while completion of Python code takes 283 seconds (to be exact, 283 seconds on Python 3.4.3 - 64 bits and 329 seconds on Python 2.7.9 - 32 bits).

So far, I've coded similar processes both in C# and Python and the execution time differences were comparable. This time however, there is an extreme difference between the elapsed times.

I think, some part of this difference arise from the flexible variable type of python language (I suspect, python converts some part of variables into double) but this much is still hard to explain.

What am I doing wrong?

My system: Windows-7 64 bits,

C# - VS Express 2012 (9 seconds)

Python 3.4.3 64 bits (283 seconds)

Python 2.7.9 32 bits (329 seconds)

c-sharp code:

using System;

namespace bug_vcs {
    class Program {
        public static void Main(string[] args) {
            DateTime t0 = DateTime.Now;
            int maxNumber = 20;
            bool found = false;
            long start = maxNumber;
            while (!found) {
                found = true;
                int i = 2;
                while ((i < maxNumber + 1) && found) {
                    if (start % i != 0) {
                        found = false;
                    }
                    i++;
                }
                start++;
            }
            Console.WriteLine("{0:d}", start - 1);
            Console.WriteLine("time elapsed = {0:f} sec.", (DateTime.Now - t0).Seconds);
            Console.ReadLine();
        }
    }
}

and python code:

from datetime import datetime

t0 = datetime.now()
max_number = 20
found = False
start = max_number
while not found:
    found = True
    i = 2
    while ((i < max_number + 1) and found):
        if (start % i) != 0:
            found = False
        i += 1
    start += 1

print("number {0:d}\n".format(start - 1))

print("time elapsed = {0:f} sec.\n".format((datetime.now() - t0).seconds))
sytech
  • 29,298
  • 3
  • 45
  • 86
ssd
  • 2,340
  • 5
  • 19
  • 37
  • 17
    You should use `StopWatch` instead of `DateTime` for calculating execution times in C# – juharr Apr 27 '15 at 18:30
  • 9
    And `timeit` for Python. – jonrsharpe Apr 27 '15 at 18:33
  • In response to @juharr's comment: On C#, I hardly stay off the table. On the other hand, I easily have a cup of coffee two floor downstairs when Python is iterating. – ssd Apr 27 '15 at 18:37
  • Also here's a good blog about the solution to [Project Euler Problem 5](http://www.mathblog.dk/project-euler-problem-5/) – juharr Apr 27 '15 at 18:38
  • @merkez3110 I'm not saying that is why there is a difference (otherwise I would have made it an answer). I'm just commenting that the correct way to time code in C# is with `StopWatch`. – juharr Apr 27 '15 at 18:40
  • Okay, I'm not after the solution, for these both have the solution. What confuses me is the run time difference. I might have missed some tricky point on Python, that's what I'm after. – ssd Apr 27 '15 at 18:40
  • While it doesn't answer your question, adding a break (for the inner while loop) after "found=false" should help your performance in both languages as you don't have to keep checking things after you have proven that it isn't valid. – Martin Noreke Apr 27 '15 at 18:44
  • 8
    Optimized code created by the C# compiler vs straight interpretation on Python. They're very different tools created for different situations. – David Crowell Apr 27 '15 at 18:44
  • 1
    I tested both and I think the time difference is indeed huge. – Cristian Lupascu Apr 27 '15 at 18:49
  • This looks like it's computing the least common multiple of the numbers 1-20. There are much faster ways to compute that, such as the Euclidean algorithm. – Kyle Apr 27 '15 at 18:50
  • 1
    @Kyle see the comments above. It's not about finding the optimal solution, but explaining the dramatic execution time difference between these seemingly identical implementations. – Cristian Lupascu Apr 27 '15 at 18:54
  • @w0lf Oh, I know, that's why it's only a comment and not an answer. – Kyle Apr 27 '15 at 19:03
  • 1
    Try it with `pypy` (http://pypy.org): `python3.4` -> 286 sec, `pypy` -> 10 sec. – Ned Deily Apr 27 '15 at 19:14
  • On linux (a different & comperatively slow machine) Python code finished execution in 257 seconds. – ssd Apr 27 '15 at 19:24
  • with my limited cython knowledge I got it to run in 5 seconds http://pastebin.com/dArFNNsh – Padraic Cunningham Apr 27 '15 at 19:37

6 Answers6

38

The answer is simply that Python deals with objects for everything and that it doesn't have JIT by default. So rather than being very efficient by modifying a few bytes on the stack and optimizing the hot parts of the code (i.e., the iteration) – Python chugs along with rich objects representing numbers and no on-the-fly optimizations.

If you tried this in a variant of Python that has JIT (for example, PyPy) I guarantee you that you'll see a massive difference.

A general tip is to avoid standard Python for very computationally expensive operations (especially if this is for a backend serving requests from multiple clients). Java, C#, JavaScript, etc. with JIT are incomparably more efficient.

By the way, if you want to write your example in a more Pythonic manner, you could do it like this:

from datetime import datetime
start_time = datetime.now()

max_number = 20
x = max_number
while True:
    i = 2
    while i <= max_number:
        if x % i: break
        i += 1
    else:
        # x was not divisible by 2...20
        break
    x += 1

print('number:       %d' % x)
print('time elapsed: %d seconds' % (datetime.now() - start_time).seconds)

The above executed in 90 seconds for me. The reason it's faster relies on seemingly stupid things like x being shorter than start, that I'm not assigning variables as often, and that I'm relying on Python's own control structures rather than variable checking to jump in/out of loops.

Blixt
  • 49,547
  • 13
  • 120
  • 153
  • 7
    In the mean time, I've downloaded `pypy` and tried my original code; execution time dropped to 10 seconds. Then tried your code (again on pypy) and improved time to 5 seconds. So, this has something to do with python itself. – ssd Apr 27 '15 at 20:04
  • 3
    @merkez3110: Yep, as I hinted at in my answer, PyPy has a JIT which essentially optimizes the code on a much lower level. This eliminates a lot of issues with Python, such as instead of performing math on objects with operator methods, it performs math directly on integers that are on the stack (as C# would). Speaking of stack, vanilla Python stores variables in a dict, hence why even changing `start` to `x` improves the performance in vanilla Python. – Blixt Apr 27 '15 at 20:10
  • 1
    Actually on second thought, I might be mixing up Python with other interpreted languages in regards to variable names affecting performance. It's possible that Python optimizes the variable lookup in the interpretation step so that variable names don't affect performance. Either way, the rest of my post should be accurate. – Blixt Apr 27 '15 at 20:18
  • 2
    I've found that `while some_expression:` loops can almost always be sped up by using `for item in some_iterable:` or `for i in range(...):`. In this case, `for i in range(2, max_number + 1):` worked well. A smaller improvement came from rewriting the outer `while True` loop using `for x in itertools.count(max_number):`. – Kevin J. Chase Apr 27 '15 at 21:19
6

Try python JIT Implementations like pypy and numba or cython if you want fast as C but sacrifice a bit of code readability.

e.g in pypy

# PyPy

number 232792560

time elapsed = 4.000000 sec.

e.g in cython

# Cython

number 232792560

time elapsed = 1.000000 sec.

Cython Source:

from datetime import datetime

cpdef void run():
    t0 = datetime.now()
    cdef int max_number = 20
    found = False
    cdef int start = max_number
    cdef int i
    while not found:
        found = True
        i = 2
        while ((i < max_number + 1) and found):
            if (start % i) != 0:
                found = False
            i += 1
        start += 1

    print("number {0:d}\n".format(start - 1))

    print("time elapsed = {0:f} sec.\n".format((datetime.now() - t0).seconds))
Mark
  • 780
  • 1
  • 6
  • 17
6

TL;DR: Long-winded post that is me trying to defend Python (my language of choice) against C#. In this example, C# performs better, but still takes more lines of code to do the same amount of work, but the final performance benefit is that C# is ~5x faster than a similar approach in Python when coded correctly. The end result is that you should use the language that suits you.

When I run the C# example, it took about 3 seconds to complete on my machine, and gave me a result of 232,792,560. It could be optimized using the known fact that you can only have a number divisible by numbers from 1 to 20 if the number is a multiple of 20, and therefore you don't need to increment by 1, but instead 20. That single optimization made the code execute ~10x faster in a mere 353 milliseconds.

When I run the Python example, I gave up on waiting and tried to write my own version using itertools, which didn't have much better success, and was taking about as long as your example. Then I hit upon an acceptable version of itertools, if I take into account that only multiples of my largest number could be divisible by all numbers from smallest to largest. As such, the refined Python(3.6) code is here with a decorator timing function that prints the number of seconds it took to execute:

import time
from itertools import count, filterfalse


def timer(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        res = func(*args, **kwargs)
        print(time.time() - start)
        return res
    return wrapper


@timer
def test(stop):
    return next(filterfalse(lambda x: any(x%i for i in range(2, stop)), count(stop, stop)))


print("Test Function")
print(test(20))
# 11.526668787002563
# 232792560

This also reminded me of a question I recently had to answer on CodeFights for Least Common Multiple using the Greatest Common Denominator function in Python. That code is as follows:

import time
from fractions import gcd
from functools import reduce


def timer(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        res = func(*args, **kwargs)
        print(time.time() - start)
        return res
    return wrapper


@timer
def leastCommonDenominator(denominators):
    return reduce(lambda a, b: a * b // gcd(a, b), denominators)


print("LCM Function")
print(leastCommonDenominator(range(1, 21)))
# 0.001001596450805664
# 232792560

As in most programming tasks, sometimes the simplest approach isn't always the fastest. Unfortunately, it really stuck out when attempted in Python this time. That said, the beauty in Python is the simplicity of getting a performant execution, where it took 10 lines of C#, I was able to return the correct answer in (potentially) a one-line lambda expression, and 300-times faster than my simple optimization on C#. I'm no specialist in C#, but implementing the same approach here is the code I used and its result (about 5x faster than Python):

using System;
using System.Diagnostics;

namespace ConsoleApp1
{
    class Program
    {
        public static void Main(string[] args)
        {
            Stopwatch t0 = new Stopwatch();
            int maxNumber = 20;

            long start;
            t0.Start();
            start = Orig(maxNumber);
            t0.Stop();

            Console.WriteLine("Original | {0:d}, {1:d}", maxNumber, start);
            // Original | 20, 232792560
            Console.WriteLine("Original | time elapsed = {0}.", t0.Elapsed);
            // Original | time elapsed = 00:00:02.0585575

            t0.Restart();
            start = Test(maxNumber);
            t0.Stop();

            Console.WriteLine("Test | {0:d}, {1:d}", maxNumber, start);
            // Test | 20, 232792560
            Console.WriteLine("Test | time elapsed = {0}.", t0.Elapsed);
            // Test | time elapsed = 00:00:00.0002763

            Console.ReadLine();
        }

        public static long Orig(int maxNumber)
        {
            bool found = false;
            long start = 0;
            while (!found)
            {
                start += maxNumber;
                found = true;
                for (int i=2; i < 21; i++)
                {
                    if (start % i != 0)
                        found = false;
                }
            }
            return start;
        }

        public static long Test(int maxNumber)
        {
            long result = 1;

            for (long i = 2; i <= maxNumber; i++)
            {
                result = (result * i) / GCD(result, i);
            }

            return result;
        }

        public static long GCD(long a, long b)
        {
            while (b != 0)
            {
                long c = b;
                b = a % b;
                a = c;
            }

            return a;
        }
    }
}

For most higher-level tasks, however, I usually see Python doing exceptionally well in comparison to a .NET implementation, though I cannot substantiate the claims at this time, aside from saying the Python Requests library has given me as much as a double to triple return in performance compared to a C# WebRequest written the same way. This was also true when writing Selenium processes, as I could read text elements in Python in 100 milliseconds or less, but each element retrieval took C# >1 second to return. That said, I actually prefer the C# implementation because of its object-oriented approach, where Python's Selenium implementation goes functional which gets very hard to read at times.

Solonotix
  • 449
  • 3
  • 15
5

As some people said the best way is using JIT implementations. I know that's an old topic, but I was curious about the difference of execution time between the implementations, so I did some tests in Jupiter Notebook with Numba and Cython that was my results:

Your code inside a function

%%time

def test():
    max_number = 20
    found = False
    start = max_number
    while not found:
        found = True
        i = 2
        while ((i < max_number + 1) and found):
            if (start % i) != 0:
                found = False
            i += 1
        start += 1
    return start-1
test()

CPU times: user 1min 18s, sys: 462 ms, total: 1min 19s Wall time: 1min 21s

The way how @Blixt wrote

%%time

def test():
    max_number = 20
    x = max_number
    while True:
        i = 2
        while i <= max_number:
            if x % i: break
            i += 1
        else:
            # x was not divisible by 2...20
            break
        x += 1
    
    return x
    
test()

CPU times: user 40.1 s, sys: 305 ms, total: 40.4 s Wall time: 41.9 s

Numba

%%time
from numba import jit

@jit(nopython=True)
def test():
    max_number = 20
    x = max_number
    while True:
        i = 2
        while i <= max_number:
            if x % i: break
            i += 1
        else:
            # x was not divisible by 2...20
            break
        x += 1
    
    return x
    
test()

CPU times: user 4.48 s, sys: 70.5 ms, total: 4.55 s Wall time: 5.01 s

Numba with function signature

%%time
from numba import jit, int32

@jit(int32())
def test():
    max_number = 20
    x = max_number
    while True:
        i = 2
        while i <= max_number:
            if x % i: break
            i += 1
        else:
            # x was not divisible by 2...20
            break
        x += 1
    
    return x
    
test()

CPU times: user 3.56 s, sys: 43.1 ms, total: 3.61 s Wall time: 3.79 s

Cython

%load_ext Cython
%%time
%%cython

def test():
    cdef int max_number = 20
    cdef int x = max_number
    cdef int i = 2
    while True:
        i = 2
        while i <= max_number:
            if x % i: break
            i += 1
        else:
            # x was not divisible by 2...20
            break
        x += 1
    
    return x
    
test()

CPU times: user 617 ms, sys: 20.7 ms, total: 637 ms Wall time: 1.31 s

GuiGNG
  • 99
  • 1
  • 4
3

Python (and all scripting languages including matlab) is not intended to be directedly used for large-scale numerical calculation. To have a compatible result as complied programs, avoid the loops at all cost and convert the formula to matrix formats (that needs a little mathematical understanding and skill), so that we can push as much as possible to the background C library provided by numpy, scipy, etc.

Again, DO NOT write loops for numerical calculation in python, whenever a matrix equivalent possible!

Ralph B.
  • 361
  • 4
  • 12
-2

First of all you need to change the algorithm to solve this problem:

#!/usr/bin/env python

import sys
from timeit import default_timer as timer

pyver = sys.version_info;

print(">")
print(">  Smallest multiple of 2 ... K");
print(">")
print(">  Python version, interpreter version: {0}.{1}.{2}-{3}-{4}".format(
    pyver.major, pyver.minor, pyver.micro, pyver.releaselevel, pyver.serial))
print(">")

K = 20;

print("  K = {0:d}".format(K))
print("")

t0 = timer()

N = K
NP1 = N + 1
N2 = (N >> 1) + 1
vec = range(0, NP1)
smalestMultiple = 1

for i in range(2, N2):
    divider = vec[i]
    if divider == 1:
        continue
    for j in range(i << 1, NP1, i):
        if (vec[j] % divider) == 0:
            vec[j] /= divider

for i in range(2, NP1):
    if vec[i] != 1:
        smalestMultiple = smalestMultiple * vec[i]

t1 = timer()

print("  smalest multiple = {0:d}".format(smalestMultiple))
print("  time elapsed = {0:f} sec.".format(t1 - t0))

Otput on Linux/Fedora 28/Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz:

>  Smallest multiple of 2 ... K
>
>  Python version, interpreter version: 2.7.15-final-0
>
>  K = 20
>
>  smalest multiple = 232792560
>  time elapsed = 0.000032 sec.
  • I needn't a fast algorithm. My question was, why seemingly identical algorithms run in different time intervals (a very large difference), using different programming languages. I'm aware that this (https://codeshare.io/2W4NBb) code runs very fast, indeed. – ssd Jan 01 '19 at 15:10