1

I am working on a theoretical graph theory problem which involves taking combinations of hyperedges in a hypergrapha to analyse the various cases.

I have implemented an initial version of the main algorithm in Python, but due to its combinatorial structure (and probably my implementation) the algorithm is quite slow.

One way I am considering speeding it up is by using either PyPy or Cython.

Looking at the documentation it seems Cython doesn't offer great speedup when it comes to tuples. This might be problematic for the implementation, since I am representing hyperedges as tuples - so the majority of the algorithm is in manipulating tuples (however they are all the same length, around len 6 each).

Since both my C and Python skills are quite minimal I would appreciate it if someone can advise what would be the best way to proceed in optimising the code given its reliance on tuples/lists. Is there a documentation of using lists/tuples with Cython (or PyPy)?

nsimplex
  • 449
  • 4
  • 12
  • Can you post your code and highlight the parts that are slow? It's hard to suggest improvements without seeing the code, since the problem might not be what you think it is. In _general_, the best answer to improving speed is to think of a better algorithm... – Roland Smith Mar 31 '13 at 16:07
  • Cython can work with C arrays and structs, and lets you define extension types. Any of those could be an alternative to tuples. – Janne Karila Mar 31 '13 at 16:17
  • @Roland, the algorithm is actually NP (it is related to matching in hypergraphs), so I can't hope for a more optimal algorithm than the one I have already implemented. However, I am only interested in a very specific case. I estimate from the running time of my naive implementation in Python, if I can make it run 100x faster then that would make it finish in an acceptable amount of time (about 2 weeks). – nsimplex Apr 03 '13 at 09:05
  • Thanks Janne, do you have any examples or pointers (excuse the pun!) on how to include c arrays into python code via cython? – nsimplex Apr 03 '13 at 09:07
  • @nsimplex: Can you show the code, so we might suggest improvements? There are some standard tricks that can help, like replacing loops by comprehensions, and trading space for time... – Roland Smith Apr 03 '13 at 15:54

2 Answers2

1

If your algorithm is bad in terms of computational complexity, then you cannot be saved, you need to write it better. Consult a good graph theory book or wikipedia, it's usually relatively easy, although there are some that have both non-trivial and crazy hard to implement algorithms. This sounds like a thing that PyPy can speed up quite significantly, but only by a constant factor, however it does not involve any modifications to your code. Cython does not speed up your code all that much without type declarations and it seems like this sort of problem cannot be really sped up just by types.

The constant part is what's crucial here - if the algorithm complexity grown like, say, 2^n (which is typical for a naive algorithm), then adding extra node to the graph doubles your time. This means 10 nodes add 1024 time time, 20 nodes 1024*1024 etc. If you're super-lucky, PyPy can speed up your algorithm by 100x, but this remains constant on the graph size (and you quickly run out of the universe time one way or another).

fijal
  • 3,190
  • 18
  • 21
  • Thanks fjal. Indeed all I need is 100x speedup. My question is realted to the point you raise concerning data types. How do I declare the python tuples in C to get maximum speedup. And examples on how to do so would be much appreciated. – nsimplex Apr 03 '13 at 09:13
  • you're confusing things. A lot. Just use PyPy. PyPy is not like Cython - you don't have to define anything, it should just work. – fijal Apr 04 '13 at 07:47
0

what would be the best way to proceed in optimising the code...

Profile first. There is a standard cProfile module that does simple profiling very well. Optimising your code before profiling is quite pointless.

Besides, for graphs you can try using the excellent networkx module. Also, if you deal with long sorted lists you can have a look at bisect and heapq modules.

Community
  • 1
  • 1
Jakub M.
  • 32,471
  • 48
  • 110
  • 179
  • Thanks Jakub. the profiler is useful, I did it run it before, through using it I estimate I need to make a function 100x faster to get the code to complete in an acceptable time for my purpose. I also did take a look at the graph theory packages you mentioned and indeed they are excellent. However, I didn't use them because in terms of graph theory algorithms I only needed matching in hypergraphs, and I had to implement it in a specific way so as to be able to make use of a few shortcuts. – nsimplex Apr 03 '13 at 09:10