1

I read through Python 3.2 changes and understand that it has made many improvement over 3.1. However, my exact same code with zero modification running on 3.2 is more than 10 times slower than when I run my code on 3.1.3

It took Python 3.2 six minutes to transfer binary content of a file to a physical device then receive and prints out the received data on screen, when the exact same scenario on same PC only takes 30 second to execute with Python 3.1.3.

I developed my code from scratch with Python 3.1.2 and 20% of my code uses ctypes to perform transaction through windows driver with USB/PCI device, so I don't think this performance hit has anything to do with backward compatibility. In my application, I create four instances of threading.Thread subclasses, each dealing with one PCI or USB device on the system. Things I suspect are that ctypes performance of 3.2 got worse than ever or there are more to threading.Thread that I have to play with to get exactly the multi-threading performance I want. Would be much appreciated if anyone can shade some lights for me

=========================================

more diagopnistic

I reduced amount of data to be sent&received

python 3.1.3 spends 3 seconds to comelete as shown in this system resource monitor screenshot http://img62.imageshack.us/img62/5313/python313.png

python 3.2 spends about 1 minutes to complete as shown in this system resource monitor screenshot http://img197.imageshack.us/img197/8366/python32.png

My PC is a single core Intel P4 with 2 GB of RAM, so I think we can rule out GIL factor for multiple core processors.

I used yappi to profile multiple runs to average out performance results on both 3.1.3 and 3.2. I see that threading and ctypes are badly performed on Python 3.2.

This is accessing thread safe queue provided with standard windows binary of python package

on 3.1.3
name                                 #n       tsub       ttot       tavg
C:\Python31\lib\queue.py.qsize:86    46070    1.352867   4.234082   0.000092
C:\Python31\lib\queue.py._get:225    8305     0.012457   0.017030   0.000002
C:\Python31\lib\queue.py.get:167     8305     0.635926   1.681601   0.000202
C:\Python31\lib\queue.py._put:221    8305     0.016156   0.020717   0.000002
C:\Python31\lib\queue.py.put:124     8305     0.095320   1.138560   0.000137

on 3.2
name                                 #n       tsub       ttot       tavg
C:\Python32\lib\queue.py.qsize:86    252168   4.987339   15.229308  0.000060
C:\Python32\lib\queue.py._get:225    8305     0.030431   0.035152   0.000004
C:\Python32\lib\queue.py.get:167     8305     0.303126   7.898754   0.000951
C:\Python32\lib\queue.py._put:221    8305     0.015728   0.020928   0.000003
C:\Python32\lib\queue.py.put:124     8305     0.143086   0.431970   0.000052

thread-wise performance is just insanely bad on Python 3.2

another example. this function simply calls API in windows USB driver through ctypes module and request 16 bits of data from USB device

on 3.1.3
name                                 #n       tsub       ttot       tavg
..ckUSBInterface.py.read_register:14 1        0.000421   0.000431   0.000431
on 3.2
name                                 #n       tsub       ttot       tavg
..ckUSBInterface.py.read_register:14 1        0.015637   0.015651   0.015651

as you can see, the time it takes is more than 30 times worse on Python 3.2

Python 3.2 seems like a disaster for my application

agf
  • 171,228
  • 44
  • 289
  • 238
SCM
  • 83
  • 2
  • 14
  • Did you end up tracking down what this is? It might be a regression in Python, it seems unlikely that it's changing language behaviour, but you might look carefully to see if this could be the case. – Matt Joiner Jun 15 '11 at 12:05
  • I ended up uninstall Python 3.2 from all of the machines and reinstall 3.1.3 – SCM Jul 26 '11 at 01:42

1 Answers1

2

There is no obvious reason why this should be. You'll need to profile the app to see exactly what takes this additional time.

Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • the print out onto screen is realtime. meaing my code prints out every block of data it received thorugh windows driver as soon as data is received. on PYthon 3.2, the print out is slow to the point that I can read every binary character on the screen as they are printed. On Python 3.1.3, the print out comes out too fast that I can't read anything on the screen while data are being printed. That's a major performance difference and it's huge. think about 30 seconds on 3.1 and 6 minutes on 3.2. I thought GIL was improved from 3.1 to 3.2... – SCM Apr 21 '11 at 23:39
  • @SCM: Aha, that is very interesting. You need to profile the app to see exactly what takes this additional time. – Lennart Regebro Apr 22 '11 at 10:06
  • I have edit and posted yappi profiling on my multithread application for a comparason between Python 3.2 and Python 3.1.3. the result is surprisingly bad for Python 3.2 – SCM Apr 22 '11 at 22:17
  • @SCM: OK, the main question here is why qsize is being called 5 times as often. That's the main reason for the performance loss. I'm not sure why it's being called *at all* to be honest. – Lennart Regebro Apr 23 '11 at 05:34
  • perhaps the same .py file is being compiled into different PyCodeObject file by 3.2 and 3.1.3. I did not modify any of the code when running in both versions of python, so I can't think of any other reason why there are differences between the two. – SCM Apr 25 '11 at 17:06
  • @SCM: I don't see what that has to do with qsize. – Lennart Regebro Apr 25 '11 at 19:56
  • in my application, I have thread safe queues for each thread. I want threads to skip actions instead of being blocked by empty or full queues, so each thread check queue before performing action. Is that not normal for multi thread python application? – SCM Apr 25 '11 at 21:12