Numbers change after reading Pickle data (python) into R

Question

I have a large dataset with unix epoch dates embedded in lists/dicts currently stored as a pickle file. I tried to import the pickle file into R using the reticulate package vis py_load_object() function. Other than, the unix epoch dates (in milliseconds), all else seems fine.

I get very strange integer conversions. For example, epoch date of 694137600000 is read as -1647101952 in R. I was wondering if there is an explanation and a work-around.

Thanks!

Might be overflowing the maximum safe integer? – CertainPerformance Aug 04 '18 at 06:13 — CertainPerformance, Aug 04 '18 at 06:13

OzanStats · Answer 1 · 2018-08-04T06:31:06.963

It is very hard to help you without a minimal reproducible example but here are some ideas:

You can un-pickle and convert the file to pandas data frame inside your Python script. The source_python function from reticulate will import it as an R data frame. Please refer to the documentation for additional information on type conversions: rstudio/reticulate
It is always possible to un-pickle the file and export as a common format such as csv using Python and then import it into R. This way, you can bypass reticulate, which is not always an efficient option.

Please also note that you may need some help when it comes to handle 13-digit numbers in R. The package bit64 would be of interest to you.

score 0 · Answer 2 · answered Aug 04 '18 at 06:43

The problem is that the values are being treated as 32 bit integers by reticulate - you can see the problem with the python snippet below:

In [1]: v = 694137600000

In [2]: v.bit_length()
Out[2]: 40

In [3]: import ctypes

In [4]: ctypes.c_int(v)
Out[4]: c_long(-1647101952)

In [5]: _.value
Out[5]: -1647101952

In [6]: ctypes.c_int64(v)
Out[6]: c_longlong(694137600000)

In [7]: ctypes.c_int32(v)
Out[7]: c_long(-1647101952)

One of the easiest workarounds is to, in python, unpickle your file and save as a .csv file but you should find that if you convert the pickled data to a pandas data frame and then access it from R it will be converted to an R dataframe - unless the date/time is the first column, (see here for why).

Numbers change after reading Pickle data (python) into R

2 Answers2