4

I have a large Python code base which we recently started compiling with Cython. Without making any changes to the code, I expected performance to stay about the same, but we planned to optimize heavier computations with Cython specific code after profiling. However, the speed of the compiled application plummeted and it appears to be across the board. Methods are taking anywhere from 10% to 300% longer than before.

I've been playing around with test code to try and find things Cython does poorly and it appears that string manipulation is one of them. My question is, am I doing something wrong or is Cython really just bad at some things? Can you help me understand why this is so bad and what else Cython might do very poorly?

EDIT: Let me try to clarify. I realize that this type of string concatenation is very bad; I just noticed it has a huge speed difference so I posted it (probably a bad idea). The codebase doesn't have this type of terrible code but has still slowed dramatically and I'm hoping for pointers on what type of constructs Cython handles poorly so I can figure out where to look. I've tried profiling but it was not particularly helpful.

For reference, here is my string manipulation test code. I realize the code below is terrible and useless, but I'm still shocked by the speed difference.

# pyCode.py
def str1():
    val = ""
    for i in xrange(100000):
        val = str(i)

def str2():
    val = ""
    for i in xrange(100000):
        val += 'a'

def str3():
    val = ""
    for i in xrange(100000):
        val += str(i)

Timing code

# compare.py
import timeit

pyTimes = {}
cyTimes = {}

# STR1
number=10

setup = "import pyCode"
stmt = "pyCode.str1()"
pyTimes['str1'] = timeit.timeit(stmt=stmt, setup=setup, number=number)

setup = "import cyCode"
stmt = "cyCode.str1()"
cyTimes['str1'] = timeit.timeit(stmt=stmt, setup=setup, number=number)

# STR2
setup = "import pyCode"
stmt = "pyCode.str2()"
pyTimes['str2'] = timeit.timeit(stmt=stmt, setup=setup, number=number)

setup = "import cyCode"
stmt = "cyCode.str2()"
cyTimes['str2'] = timeit.timeit(stmt=stmt, setup=setup, number=number)

# STR3
setup = "import pyCode"
stmt = "pyCode.str3()"
pyTimes['str3'] = timeit.timeit(stmt=stmt, setup=setup, number=number)

setup = "import cyCode"
stmt = "cyCode.str3()"
cyTimes['str3'] = timeit.timeit(stmt=stmt, setup=setup, number=number)

for funcName in sorted(pyTimes.viewkeys()):
    print "PY {} took {}s".format(funcName, pyTimes[funcName])
    print "CY {} took {}s".format(funcName, cyTimes[funcName])

Compiling a Cython module with

cp pyCode.py cyCode.py
cython cyCode.py
gcc -O2 -fPIC -shared -I$PYTHONHOME/include/python2.7 \
    -fno-strict-aliasing -fno-strict-overflow -o cyCode.so cyCode.c

Resulting timings

> python compare.py 
PY str1 took 0.1610019207s
CY str1 took 0.104282140732s
PY str2 took 0.0739600658417s
CY str2 took 2.34380102158s
PY str3 took 0.224936962128s
CY str3 took 21.6859738827s

For reference, I've tried this with Cython 0.19.1 and 0.23.4. I've compiled the C code with gcc 4.8.2 and icc 14.0.2, trying various flags with both.

rpmcnally
  • 249
  • 2
  • 12

2 Answers2

4

Worth reading: Pep 0008 > Programming Recommendations:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b . This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Reference: https://www.python.org/dev/peps/pep-0008/#programming-recommendations

Czarek Tomczak
  • 20,079
  • 5
  • 49
  • 56
  • Knowing that CPython specifically optimizes for this does help. Indeed, changing the third example to 'val = str(i) + val' makes CPython take even longer than Cython (~24s). So maybe the real question is, how do I know what CPython optimizes that other implementations may not? I'm certain that string concatenation is not the real issue in our codebase. – rpmcnally Mar 04 '16 at 14:44
  • 2
    For the curious about where the optimisation happens, see the CPython interpretor source: https://hg.python.org/cpython/file/7fa3e824a4ee/Python/ceval.c#l1677. Note the special case check for "pyunicode" (in Python 3 at least!). In contrast Cython just does `PyNumber_InPlaceAdd` – DavidW Mar 04 '16 at 17:54
  • If you want to know where CPython does optimizations that Cython doesn't then searching that file for `_CheckExact` may be a good place to start, although it's probably a little tedious. The main other obvious candidate I can see if `%` string formatting – DavidW Mar 04 '16 at 17:55
2

Repeated string concatenation of that form is usually frowned upon; some interpreters optimize for it anyway (secretly overallocating and allowing mutation of technically immutable data types in cases where it's known to be safe), but Cython is trying to hard code some things, which makes that harder.

The real answer is "Don't concatenate immutable types over and over." (it's wrong everywhere, just worse in Cython). A perfectly reasonable approach Cython would likely handle fine is to make a list of the individual str, and then call ''.join(listofstr) at the end to make the str at once.

In any event, you're not giving Cython any typing information to work with, so the speed ups aren't going to be very impressive. Try to help it out with the easy stuff, and the speed ups there may more than make up for losses elsewhere. For example, cdef your loop variable and using ''.join might help here:

cpdef str2():
    cdef int i
    val = []
    for i in xrange(100000):  # Maybe range; Cython docs aren't clear if xrange optimized
        val.append('a')
    val = ''.join(val)
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • 2
    The `cython` code has to make Python function calls - to format the integer, and to create a new string. It may translate the iteration into C, but that's a minor part of the action. – hpaulj Mar 04 '16 at 06:21
  • Thanks for the answer. I realize this type of concatenation is terrible practice, but was under the impression that, without optimization, Cython code would spit out essentially the same code as CPython but apparently not. But our codebase is pretty mature so I would be shocked to find this type of code anywhere; do you have pointers to any other things that CPython might be optimizing but Cython would not? – rpmcnally Mar 04 '16 at 14:24
  • @rpmcnally: It's going to be really hard to enumerate those sorts of things. Cython and CPython optimization are near polar opposites in some respects; Cython benefits from using lots of low-level "primitives" (looping over ranges and indexing a list for instance) because it's easy to translate to C; CPython by contrast is much faster directly iterating the `list` or using `enumerate` if the `list` is needed. Basically, Cython is fastest when you write C-like Python code, using C-like assumptions (one of which is, string concatenation sucks). – ShadowRanger Mar 04 '16 at 14:52
  • I was afraid of that. The problem is that profiling is showing the slowdown across the board; almost everything is slower. I guess I'll just have to figure out a better way to profile this code. So what is proper etiquette for "I asked a poor question"? Just delete it or pick the closest thing to an answer? – rpmcnally Mar 04 '16 at 15:22
  • 1
    @rpmcnally I'd leave the question as is and (optionally) pick the answer that best answered the original question about string concatenation. The more general question of "what does Cython do badly compared to CPython?" is probably harder to get a definitive answer to and might be closed as "too broad" – DavidW Mar 04 '16 at 17:49