23

I'm trying to run some code using PyPy to speed it up. My code uses Pandas dataframes, so I'm trying to find a way to install the package...

Unfortunately, I can't find a way to do that... searching online yields this and this -- two disappointing results which say it's not possible, but they are 1-2 years old!

There was a glimmer of hope from this twitter post from Romain Guillebert which suggests I could do it using a package called pymetabiosis. Unfortunately when I go to install that, I get the error mentioned below.

Any idea how I can debug the error or find some other way of using Pandas with PyPy?**


Error message when installing pymetabiosis:
Collecting pymetabiosis
  Using cached pymetabiosis-0.0.1.tar.gz
    Complete output from command python setup.py egg_info:
    pymetabiosis/__pycache__/_cffi__x771a6f66x197b9d2b.c:219:13: warning: initializing 'char **' with an expression of type 'const char **' discards qualifiers in nested pointer types [-Wincompatible-pointer-types-discards-qualifiers]
      { char * *tmp = &p->ml_name; (void)tmp; }
                ^     ~~~~~~~~~~~
    pymetabiosis/__pycache__/_cffi__x771a6f66x197b9d2b.c:220:13: warning: incompatible pointer types initializing 'void **' with an expression of type 'PyCFunction *' (aka 'struct _object *(**)(struct _object *, struct _object *)') [-Wincompatible-pointer-types]
      { void * *tmp = &p->ml_meth; (void)tmp; }
                ^     ~~~~~~~~~~~
    pymetabiosis/__pycache__/_cffi__x771a6f66x197b9d2b.c:222:13: warning: initializing 'char **' with an expression of type 'const char **' discards qualifiers in nested pointer types [-Wincompatible-pointer-types-discards-qualifiers]
      { char * *tmp = &p->ml_doc; (void)tmp; }
                ^     ~~~~~~~~~~
    pymetabiosis/__pycache__/_cffi__x771a6f66x197b9d2b.c:1189:30: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'PyCodeObject *' [-Wincompatible-pointer-types]
      { result = PyEval_EvalCode(x0, x1, x2); }
                                 ^~
    //anaconda/include/python2.7/eval.h:10:54: note: passing argument to parameter here
    PyAPI_FUNC(PyObject *) PyEval_EvalCode(PyCodeObject *, PyObject *, PyObject *);
                                                         ^
    pymetabiosis/__pycache__/_cffi__x771a6f66x197b9d2b.c:1857:12: warning: incompatible integer to pointer conversion assigning to 'PyObject *' (aka 'struct _object *') from 'int' [-Wint-conversion]
      { result = PyObject_SetAttr(x0, x1, x2); }
               ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    pymetabiosis/__pycache__/_cffi__x771a6f66x197b9d2b.c:2164:5: warning: incompatible pointer types assigning to 'PyObject *(*)(size_t, ...)' (aka 'struct _object *(*)(unsigned long, ...)') from 'PyObject *(Py_ssize_t, ...)' (aka 'struct _object *(long, ...)') [-Wincompatible-pointer-types]
      i = (PyTuple_Pack);
        ^ ~~~~~~~~~~~~~~
    6 warnings generated.
    ld: warning: directory not found for option '-L//anaconda/lib
    '
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/t4/n42mh55n05sd6s5hgk1dzdz80000gp/T/pip-build-fSTXPd/pymetabiosis/setup.py", line 2, in <module>
        from pymetabiosis.bindings import ffi
      File "pymetabiosis/__init__.py", line 1, in <module>
        from pymetabiosis.module import import_module
      File "pymetabiosis/module.py", line 2, in <module>
        from pymetabiosis.wrapper import MetabiosisWrapper
      File "pymetabiosis/wrapper.py", line 3, in <module>
        from __pypy__ import identity_dict
    ImportError: No module named __pypy__

    ----------------------------------------
Command "python setup.py egg_info" failed with error code -11 in /private/var/folders/t4/n42mh55n05sd6s5hgk1dzdz80000gp/T/pip-build-fSTXPd/pymetabiosis/
Afflatus
  • 2,302
  • 5
  • 25
  • 40
  • 1
    pypy still conflicts with pandas. – kilojoules Sep 14 '16 at 22:40
  • @kilojoules :( (but thanks for letting me know) – Afflatus Sep 14 '16 at 22:41
  • 1
    ``PyDateTime_GET_MONTH(val) = -1;`` looks like the error posted on http://packages.pypy.org/##pandas . If that's all, someone should fix that. Contributions welcome! – Armin Rigo Sep 15 '16 at 09:26
  • @Afflatus did you eventually install pandas over pypy? – NI6 Dec 01 '18 at 16:58
  • @NI6 don't remember but I doubt it. – Afflatus Dec 01 '18 at 17:09
  • coming back to this Q after much more experience with pandas/python, i highly doubt that using PyPy will speed up pandas, as most of the core pandas routines are written in C and highly performant when used correctly. I.e. it's highly unlikely you will see large performance gains when using PyPy version of pandas, since it's probably the same underlying C code anyway. Quick google will show you how to correctly use Pandas in cpython fastly. – Hansang Nov 05 '20 at 13:51
  • Yes, pandas is still not building for pypy3 V7.3.7 (python 3.8.12) – 0xc0de Jan 02 '22 at 08:06
  • @0xc0de what about PyPy v7.3.9? – Prince Roshan Apr 03 '22 at 18:38

3 Answers3

12

pypy v5.9 has started to support pandas (and numpy)

NI6
  • 2,477
  • 5
  • 17
  • 28
8

Here's what I've done, assume you're using Conda, but Pip/Venv should work also

Make a new conda env

conda create --name pypy_env
conda activate pypy_env

Install pypy3 using conda

conda install pypy3

Get Pip for pypy3 using the method here Install pip on pypy

Install packages for pypy using

pypy3 -m pip install pandas
s5s
  • 11,159
  • 21
  • 74
  • 121
Hansang
  • 1,494
  • 16
  • 31
  • Using it with pip didn't work for me. I checked PyPy's website and found this: [supported packages](http://packages.pypy.org/##python-keystoneclient). Seems like Pandas is not supported (at least not without some hacky workarounds) – hhlw Mar 19 '20 at 14:29
  • FYI for anyone that found this answer useful, see my comment above on the question - but TL;DR you're unlikely to get much benefit from doing this, even if it works. – Hansang Nov 05 '20 at 13:54
  • Doesn't work with pypy3 V7.3.7 (python 3.8.12) – 0xc0de Jan 02 '22 at 08:05
1

It's simple to install and configure pypy3 on your ubuntu/debian machine:

sudo add-apt-repository ppa:pypy/ppa
sudo apt update
sudo apt install pypy3

Also you could add:

sudo apt install pypy3-dev pypy3-venv

Then you could install pandas and numpy globaly:

pypy3 -m pip install pandas numpy

Installing through pip may takes some minutes to complete

Tarek Kalaji
  • 2,149
  • 27
  • 30