5

I am looking to bring speed improvements to an existing application and I'm looking for advice on my possible options. The application is written in Python, uses wxPython, and is packaged with py2exe (I only target windows platforms). Parts of the application are computationally intensive and run too slowly in interpreted Python. I am not familiar with C, so porting parts of the code over is not really an option for me.

So my question is basically do I have a clear picture of my options as I outline below, or am I approaching this from the wrong direction?

  • Running with pypy: Today I started experimenting with Pypy - the results are exciting, in that I can run large parts of the code from the pypy interpreter and I'm seeing 5x+ speed improvements with no code changes. However, if I understand correctly, (a) Pypy with wxpython support is still a work in progress, and (b) I cannot compile it down to an exe for distribution anyway. So unless I'm mistaken, this seems like a no-go for me? There's no way to package things up so parts of it are executed with pypy?
  • Converting code to RPython, translating with pypy So the next option seems to be actually rewriting parts of the code to the pypy restricted language, which seems like a pretty large job. But if I do that, parts of the code can then be compiled to an executable (?) and then I can access the code through ctypes (?).
  • Other restricted options Shedskin seems to be a popular alternative here, does this fit my requirements better? Other options seem to be Cpython, Psyco, and Unladen, but they are all superseded or no longer maintained.
Community
  • 1
  • 1
misshapen
  • 328
  • 1
  • 6

3 Answers3

6

Using PyPy indeed rules out py2exe and similar tools, at least until one is ported (AFAIK there is no active work on that). Still, as PyPy binaries do not need to be installed, you might get away with a more complicated distribution that includes both your Python source code and a PyPy binary+stdlib and uses a small wrapper (batch file, executable) to ease launching. I can't comment on whether WxPython on PyPy is mature enough to be used, but perhaps someone on pypy-dev, wxpython-dev or either one's IRC channel can give a recommendation if you describe your situation.

Translating your code into RPython does not seem viable to me. The translation toolchain is not really a tool for general purpose development, and producing a C dll for embedding/ctypes seems nontrivial. Also, RPython code really is low-level, making your Python code restricted enough may amount to rewriting half of it.

As for other restricted options: You seem to mix up CPython (the original Python interpreter written in C) with Cython (a compiler for a Python-like language that emits C code suitable for CPython extension modules). Both projects are active. I'm not very familiar with Shedskin, but it seems to be a tool for developing whole programs, with little or no interaction with non-restricted Python code. Cython seems a much better fit: Although it requires manual type annotations and lower-level code to achieve really good performance, it's trivial to use from Python: The very purpose of the project is producing extension modules.

  • Thank you very much, I have accepted your answer. However, after a very productive afternoon playing with Cython, I think i've hit an insurmountable issue: Cython's lack of threading support. My computationally expensive calculations, which before ran in threads and didn't lock the GUI, now freeze everything. From googling around, it seems this is by design; to get parallelism (http://docs.cython.org/src/userguide/parallelism.html#parallel) is to release the GIL, and all code in the GIL cannot use any native python objects, am I understanding this correctly? That would make Cython no good. – misshapen Jun 09 '12 at 16:41
  • @NickJ (1) I don't know if Cython supports Python threads - but I don't see why it couldn't. Maybe the alternative you link to is simply *preferred*, because it can use multiple CPUs. (2) I'm not sure if that's a typo on your part, but code protected by the GIL *would* be able to use Python types, whereas `nogil` sections cannot (because CPython, and thus all Python types, rely on the GIL). (3) Aside from that, you could get away without `threading` in Cython by putting whatever you want to parallelize into relatively-pure Cython functions and kicking off the thread via Python. –  Jun 09 '12 at 17:48
3

I would definitely look into Cython, I've been playing with it some and have seen speedups of ~100x over pure python. Use the profile module to find the bottlenecks first. Usually the loops are the biggest chances to increase speed when going to Cython. You should also look into seeing if you can use array/vector operations in Numpy instead of loops, if so that can also give extreme performance boosts. For instance:

a = range(1000000)
for i in range(len(a)):
    a[i] += 5

is slow, real slow. On the other hand:

a = numpy.arange(10000000)
a = a +5

is fast, real fast.

reptilicus
  • 10,290
  • 6
  • 55
  • 79
0

Correction: shedskin can be used to generare extention modules, as well as whole programs.

user876508
  • 351
  • 2
  • 4