I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?
7 Answers
As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.
It is a wrapper around the C library librdata, so it is very fast.
You can install it easily with pip:
pip install pyreadr
As an example you would do:
import pyreadr
result = pyreadr.read_r('/path/to/file.RData') # also works for Rds
# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1
The repo is here: https://github.com/ofajardo/pyreadr
Disclaimer: I am the developer of this package.

- 3,037
- 1
- 18
- 26
-
1KeyError: 'df1' – rsc05 Feb 03 '20 at 14:13
-
df1 is just a dummy example, you have to use a key from results.keys(). For issues please use the guthub repo issues. – Otto Fajardo Feb 04 '20 at 17:17
-
3Hi, I am struggling with this in Python 3.7.6. I installed with ```pip install readr``` and ```conda install -c conda-forge pyreadr ```. I managed to import RData into python as an ```OrderedDict``` but I cannot access the data inside. Even with a basic 3x2 RData file ```df.keys()``` yields ```odict_keys([None])```. Has something changed in Python or R that you need to update or am I missing something. – gmarais Jul 07 '20 at 18:20
-
Please file an issue in the github if you have problems. Read carefully the Known limitations section in the Readme before. When submitting an issue post the code and file to reproduce the problem; otherwise I cannot help. – Otto Fajardo Jul 08 '20 at 19:35
-
I'm getting error ```LibrdataError: Unable to convert string to the requested encoding``` when trying to read a ```.RData``` file. Different ```.RData``` files work, but the one with the data in the format I need doesn't load in, go figure! Any common reasons as to why this might happen? – AcidCatfish Jul 15 '21 at 05:16
-
1right now pyreadr can handle only files saved in UTF-8. Yours probably is in some other encoding. There is an issue in github open about that error. – Otto Fajardo Jul 15 '21 at 14:30
-
Would you mind having a look here? https://stackoverflow.com/questions/72702702/batching-large-rfiles-in-pyreadr Thanks – Tomasz Kania Jun 21 '22 at 14:37
People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData
file format. So any other implementation in any other language is hard++.
I think the only reasonable way is to install RPy2 and use R's load
function from that, converting to appropriate python objects as you go. The .RData
file can contain structured objects as well as plain tables so watch out.
Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/
Quicky:
>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")
objects are now loaded into the R workspace.
>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]
That's a simple scalar, d is a data frame, I can subset to get columns:
>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[ 1, 2, 3, ..., 8, 9, 10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]

- 1,200
- 7
- 10

- 92,590
- 12
- 140
- 224
-
One useful comment to add: typically you will want to manipulate these objects in Numpy, so I would add the following: `from rpy2.robjects import numpy2ri` and then `numpy2ri.ri2numpy(r['d'])`. You then have numpy arrays that you can manipulate in a "pythonic" way. – jonathanrocher Jul 30 '14 at 21:52
-
@jonathanrocher it seems these days (numpy 1.11.1) you can just do `np.array( r['d'] )` and I don't have `ri2numpy` in `numpy2ri` anymore. – daknowles Dec 10 '16 at 19:02
-
1Note: the documentation is now hosted on GithHub: https://rpy2.github.io/doc/latest/html/ (the current link does not work) – krassowski Apr 24 '20 at 09:17
Jupyter Notebook Users
If you are using Jupyter notebook, you need to do 2 steps:
Step 1: go to http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2 and download Python interface to the R language (embedded R) in my case I will use rpy2-2.8.6-cp36-cp36m-win_amd64.whl
Put this file in the same working directory you are currently in.
Step 2: Go to your Jupyter notebook and write the following commands
# This is to install rpy2 library in Anaconda
!pip install rpy2-2.8.6-cp36-cp36m-win_amd64.whl
and then
# This is important if you will be using rpy2
import os
os.environ['R_USER'] = 'D:\Anaconda3\Lib\site-packages\rpy2'
and then
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
This should allow you to use R functions in python. Now you have to import the readRDS
as follow
readRDS = robjects.r['readRDS']
df = readRDS('Data1.rds')
df = pandas2ri.ri2py(df)
df.head()
Congratulations! now you have the Dataframe you wanted
However, I advise you to save it in pickle file for later time usage in python as
df.to_pickle('Data1')
So next time you may simply use it by
df1=pd.read_pickle('Data1')

- 3,626
- 2
- 36
- 57
Well, I couple years ago I had the same problem as you. I wanted to read .RData
files from a library that I was developing. I considered using RPy2, but that would have forced me to release my library with a GPL license, which I did not want to do.
"pyreadr" didn't even exist then. Also, the datasets which I wanted to load were not in a standardized format as a data.frame
.
I came to this question and read Spacedman answer. In particular, I saw the line
So any other implementation in any other language is hard++.
as a challenge, and implemented the package rdata in a couple of days as a result. This is a very small pure Python implementation of a .RData
parser and converter, able to suit my needs until now. The steps of parsing the original objects and converting to apropriate Python objects are separated, so that users could use a different conversion if they want. Moreover, users can add constructors for custom R classes.
This is an usage example:
>>> import rdata
>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_vector.rda")
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_vector': array([1., 2., 3.])}
As I said, I developed this package and have been used since without problems, but I did not bother to give it visibility as I did not document it properly. This has recently changed and now the documentation is mostly ok, so here it is for anyone interested:

- 1,418
- 12
- 20
-
Worked for me without much overhead. You deserve a dual upvote :) – Zeel B Patel Feb 23 '21 at 02:02
-
I think the "hard++" comment refers to that, as the code (in R) is the definition, the definition can change at any time. Kudos (and upvote) for writing the library, but I hope you have an extensive suite of tests against the official R documentation. – Ketil Malde Mar 17 '22 at 08:51
-
1I have tests loading all major kinds of R objects, and I (and others) use it continuously to load non-trivial datasets. So far I haven't found a dataset I can't load, but if anyone founds any I accept issues and I review them in a timely manner. – Mabus Mar 18 '22 at 09:24
There is a third party library called rpy
, and you can use this library to load .RData
files. You can get this via a pip
install pip instally rpy
will do the trick, if you don't have rpy
, then I suggest that you take a look at how to install it. Otherwise, you can simple do:
from rpy import *
r.load("file name here")
EDIT:
It seems like I'm a little old school there,s rpy2 now, so you can use that.

- 80,178
- 33
- 141
- 199
Answer by @rsc05 that caters to the Notebook users worked for me, but apparently one of the functions[df = pandas2ri.ri2py(df)
] has been deprecated and now it should be df = pandas2ri.rpy2py(df)
.
So, the complete solution should look like :
# import the libraries
>> import rpy2.robjects as robjects
>> from rpy2.robjects import pandas2ri
#activate
>> pandas2ri.activate()
# create readRDS object
>> readRDS = robjects.r['readRDS']
# read .rds using readRDS object
>> df = readRDS('sri_testing_data.rds')
# convert the data into native dataframe object
>> df = pandas2ri.rpy2py(df)
#print the dataframe
>> df.head()

- 29
- 4
Try this
!pip install pyreadr
Then
result = pyreadr.read_r('/content/nGramsLite.RData')
# objects
print(result.keys()) # let's check what objects we got
>>>odict_keys(['ngram1', 'ngram2', 'ngram3', 'ngram4'])
df1 = result["ngram1"]
df1.head()
Done!!

- 39
- 4