2

I have .npy files that were created using Python 2.7.9 and Numpy Version 1.11.3 with the command np.save('filename'). The files were produced on an external machine that is part of the linux-cluster of our institute. I copied the files to my local machine in order to import them via np.load('filename.npy'). On my local machine I am running Python 3.5.2 and Numpy Version 1.13.0 with Jupyter-Notebook. The local OS is Ubuntu 16.04.2.

When I try to load the files locally I get the error:

ValueError: invalid literal for int() with base 16

After browsing through some Stackoverflow questions I tried to specify the encoding with:

np.load('filename.npy',encoding='latin1')

This gives the same error. encoding='bytes' yields:

TypeError: can't multiply sequence by non-int of type 'float'

Here is a larger snippet of the Traceback:

/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
417             else:
418                 return format.read_array(fid, allow_pickle=allow_pickle,
--> 419                                          pickle_kwargs=pickle_kwargs)
420         else:
421             # Try a pickle

/usr/local/lib/python3.5/dist-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
638             pickle_kwargs = {}
639         try:
--> 640             array = pickle.load(fp, **pickle_kwargs)
641         except UnicodeError as err:
642             if sys.version_info[0] >= 3:

/usr/local/lib/python3.5/dist-packages/sympy/core/numbers.py in __new__(cls, num, prec)
823                 else:
824                     _mpf_ = mpmath.mpf(
--> 825                         S.NegativeOne**num[0]*num[1]*2**num[2])._mpf_
826         elif isinstance(num, Float):
827             _mpf_ = num._mpf_

TypeError: can't multiply sequence by non-int of type 'float'

I guess that something with the encoding went wrong on the transition between the Python and Numpy versions. Any ideas on how I can import the files?

P. Aumann
  • 41
  • 4
  • You can't load Python 2 Numpy bytecode in Python 3, or vice versa, it just doesn't make any sense to try. have you tried loading the .npy files in python 2? because you already have python 2 installed if you're using ubuntu – cat Jun 16 '17 at 14:37
  • Is it generally impossible? I think I did import Python 2 .npy files in Python 3 before and everything went smoothly. I can't tell for certain why it didn't led to an error before... Loading the files in Python 2 works. (I insert `%%python2` at the beginning of the notebook cell to do so). But using Python 2 leads to further errors such that I was looking for a solution to stick with Python 3 for the usage of these files. – P. Aumann Jun 16 '17 at 14:53
  • 1
    Do you know what's in this file? Just a numeric array? Or some sort of `object`. The error is in `pickle_load`, suggesting the later. `np.save` docs has some cautions regarding PY2/3 compatibility when pickling objects. – hpaulj Jun 16 '17 at 17:03
  • This was a crucial hint, thank you! The files are basically just 2-dimensional numeric arrays. However, the numerical values that are stored to these arrays are of the type `sympy.core.numbers.Float`. Saving and loading with NumPy leads to an array with `dtype=object`. I guess a proper workaround for my problem is to convert the SymPy-Number to a Python-Float before writing it into the list. @cat thank you for your explinations as well! – P. Aumann Jun 16 '17 at 18:18

1 Answers1

2

As shown in What is the way data is stored in *.npy?, .npy files are bytecode, which you will see if you open one in a hex editor.

Python 2 bytecode .pyc, .pyo files cannot be run in Python 3, as the virtual machine and compiler internals have changed with the major version.

Similarly, NumPy's C internals and bytecode compiler have changed as well in Python 3, breaking backwards compatibility. (This is intentional since bytecode is not meant to persist for so long, or be used in different versions than it was created.)

The composition of these changes means that there is no way, without huge changes to Python 3's bytecode interpreter and Python 3's NumPy, and/or a transpiler from Python 2 NumPy bytecode to that of Python 3, to use these Python 2 .npy files in Python 3.


As I alluded to earlier, this is a bit of an X/Y Problem. You should not be reliant on the .npy files to work across versions, because it is not guaranteed they will as they are inherently a volatile format (like Python VM bytecode).

Instead of reverse-engineering the bytecode to debug it, try to get the source from which these files were generated.

cat
  • 3,888
  • 5
  • 32
  • 61
  • 2
    But a `.npy` file isn't saving Python bytecodes or any compiled code. It's saving data - array attributes plus the array data buffer. There are incompatibilities when pickling between Py2 and Py3, but numeric arrays shouldn't be different. – hpaulj Jun 16 '17 at 17:07
  • @hpaulj No, but the binary format has changed between versions 2 and 3, and nobody should (have been) relying on it to stay the same. Just as an ELF binary compiled for Linux 2.6 will not run on Linux 4.15, yet the binary contains almost none of the compiler's internals (excepting `_start` and so on), the ELF interpreter `ld.so` cannot make heads or tails of your old binary, so Python 3 cannot make heads or tails of a Python 2 `pyc` or `npy` file. – cat Oct 15 '18 at 18:23
  • By the way, there's no guarantee that CPython PyObject memory layouts didn't change between 2 and 3, and there's no guarantee that Numpy's C object memory layouts and serialisation didn't change between 2 and 3 -- it probably did to improve performance with a major version. – cat Oct 15 '18 at 18:25