6

I am trying to convert one part of R code in to Python. In this process I am facing some problems.

I have a R code as shown below. Here I am saving my R output in .rdata format.

nms <- names(mtcars)
save(nms,file="mtcars_nms.rdata")

Now I have to load the mtcars_nms.rdata into Python. I imported rpy2 module. Then I tried to load the file into python workspace. But could not able to see the actual output.

I used the following python code to import the .rdata.

import pandas as pd
from rpy2.robjects import r,pandas2ri
pandas2ri.activate()

robj = r.load('mtcars_nms.rdata')
robj

My python output is

R object with classes: ('character',) mapped to:
<StrVector - Python:0x000001A5B9E5A288 / R:0x000001A5B9E91678>
['mtcars_nms']

Now my objective is to extract the information from mtcars_nms.

In R, we can do this by using

load("mtcars_nms.rdata");
get('mtcars_nms')

Now I wanted to do the same thing in Python.

RSK
  • 751
  • 2
  • 7
  • 18

3 Answers3

12

There is a new python package pyreadr that makes very easy import RData and Rds files into python:

import pyreadr

result = pyreadr.read_r('mtcars_nms.rdata')

mtcars = result['mtcars_nms']

It does not depend on having R or other external dependencies installed. It is a wrapper around the C library librdata, therefore it is very fast.

You can install it very easily with pip:

pip install pyreadr

The repo is here: https://github.com/ofajardo/pyreadr

Disclaimer: I am the developer.

Otto Fajardo
  • 3,037
  • 1
  • 18
  • 26
  • Hi, when i try to load .rdata of the size 700M, such error occurs:ValueError: Unable to allocate memory – jmir Jan 07 '19 at 07:59
  • hi, thanks for trying the package! a 700 M RData file is huge! Do you have enough RAM? Take into account that RData is highly compressed. For example, I have a 40 M RData file, if I save it as CSV it takes 450 M so 11 times more. If you load that into R it occupies 1.3 Gb! and loading into python takes 1.8 Gb! meaning 45 times more (pandas is very hungry). That means if you have a 700 Mb RData file, you need at least 32 Gb of RAM! and that would be very tight, because the process needs transiently even more RAM. I would say you would need at least 64 GB RAM or so. – Otto Fajardo Jan 08 '19 at 09:44
  • In the librdata C library there is a hard coded limit for max number of bytes a vector in a data frame can have. It used to be 2**24 bytes, meaning 2**22 integers or 2**21 doubles, which is relatively low. If a vector contains more bytes than this a memory error was raised. It has been increased in the newest version to 2**32 bytes, meaning 2**30 intengers or 2**29 doubles, which should be enough for practical purposes. – Otto Fajardo Apr 14 '19 at 13:15
3

Rather than using the .rdata format, I would recommend to use feather, which allows to efficiently share data between R and Python.

In R, you would run something like this:

library(feather)
write_feather(nms, "mtcars_nms.feather")

In Python, to load the data into a pandas dataframe, you can then simply run:

import pandas as pd
nms = pd.read_feather("mtcars_nms.feather")
mloning
  • 825
  • 7
  • 18
0

The R function load will return an R vector of names for the objects that were loaded (into GlobalEnv).

You'll have to do in rpy2 pretty much what you are doing in R:

R:

get('mtcars_nms')

Python/rpy2

robjects.globalenv['mtcars_nms']
lgautier
  • 11,363
  • 29
  • 42