Save 2d numpy array to R file format using rpy2

Question

This is a beginner's question but how do you save a 2d numpy array to a file in (compressed) R format using rpy2? To be clear, I want to save it in rpy2 and then later read it in using R. I would like to avoid csv as the amount of data will be large.

Skylar Saveland · Accepted Answer · 2012-07-20T21:57:25.913

7

Looks like you want the save command. I would use the pandas R interface and do something like the following.

import numpy as np
from rpy2.robjects import r
import pandas.rpy.common as com
from pandas import DataFrame
a = np.array([range(5), range(5)])
df = DataFrame(a)
df = com.convert_to_r_dataframe(df)
r.assign("foo", df)
r("save(foo, file='here.gzip', compress=TRUE)")

There may be a more elegant way, though. I'm open to better suggestions. The above, in R would be used:

> load("here.gzip")
> foo
  X0 X1 X2 X3 X4
0  0  1  2  3  4
1  0  1  2  3  4

You can bypass the use of pandas and use numpy2ri from rpy2. With something like:

from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
a = np.array([[i*2147483647**2 for i in range(5)], range(5)], dtype="uint64")
a = np.array(a, dtype="float64") # <- convert to double precision numeric since R doesn't have unsigned ints
ro = numpy2ri(a)
r.assign("bar", ro)
r("save(bar, file='another.gzip', compress=TRUE)")

In R then:

> load("another.gzip")
> bar
     [,1]         [,2]         [,3]         [,4]         [,5]
[1,]    0 4.611686e+18 9.223372e+18 1.383506e+19 1.844674e+19
[2,]    0 1.000000e+00 2.000000e+00 3.000000e+00 4.000000e+00

edited Jul 20 '12 at 21:57

answered Jul 20 '12 at 20:41

Skylar Saveland

11,116
9
75
91

Thanks but installing pandas under ubuntu 11.10 fails with error: Setup script exited with pandas requires NumPy >= 1.6 due to datetime64 dependency – Simd Jul 20 '12 at 21:10
I'm not sure how to do it without pandas. Can you upgrade your numpy? I usually use `virtualenv` and `pip` which will install the latest stable `numpy` and `pandas` for you. – Skylar Saveland Jul 20 '12 at 21:13
Upgrading numpy will be a pain and also make the script less portable sadly. I feel rpy2 should be able to call save too if I can just get the right syntax for it. – Simd Jul 20 '12 at 21:18
added a pure rpy2 example; resulting R objects are a little different, this is probably what you want. – Skylar Saveland Jul 20 '12 at 21:27
1

Thanks! I have upvoted. I now get the annoying ("Cannot convert numpy array of unsigned values -- R does not have unsigned integers.") which I suppose is the next thing to worry about :) – Simd Jul 20 '12 at 21:34
I have values like 5688343225308272000L in the array. I assume R can represent large numbers too somehow. – Simd Jul 20 '12 at 21:50
https://stat.ethz.ch/pipermail/r-help/2012-January/300250.html I have a 32bit signed integer on my machine here. You can try to change the dtype of the array but you might be out of luck http://docs.scipy.org/doc/numpy/user/basics.types.html – Skylar Saveland Jul 20 '12 at 21:51
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/14203/discussion-between-raphael-and-skyl) – Simd Jul 20 '12 at 22:01
Using the second approach, I get the following error: `ImportError: cannot import name 'numpy2ri' from 'rpy2.robjects.numpy2ri' ` – Johannes Wiesner Feb 09 '23 at 17:32

Phil Cooper · Answer 2 · 2012-07-20T21:47:31.460

2

Here's an example without pandas that adds column and row names

import numpy as np
from rpy2.robjects import rinterface, r, IntVector, FloatVector, StrVector

# older (<2.1) versions of rpy2 have globenEvn vs globalenv
# let's fix it a little
if not hasattr(rinterface,'globalenv'):
        warnings.warn('Old version of rpy2 detected')
        rinterface.globalenv = rinterface.globalEnv

var_name = 'r_var'
vals = np.arange(20,dtype='float').reshape(4,5)

# transpose because R is column major vs python is row major 
r_vals = FloatVector(vals.T.ravel())
# make it  a matrix
rinterface.globalenv[var_name]=r['matrix'](r_vals,nrow=vals.shape[0])
# give it some row and column names
r("rownames(%s) <- c%s"%(var_name,tuple('ABCDEF'[i] for i in range(vals.shape[0]))))
r("colnames(%s) <- c%s"%(var_name,tuple(range(vals.shape[1]))))

#save it to file
r.save(var_name,file='r_from_py.rdata')

edited Jul 20 '12 at 21:47

answered Jul 20 '12 at 21:41

Phil Cooper

5,747
1
25
41

Thanks. Is FloatVector changing the type from unsigned int as well as transposing (see my comment to the first answer)? – Simd Jul 20 '12 at 21:59
@Raphael FloatVector creates a float but I also tested a version of the above with IntVector (with dtype='int') and had no errors. – Phil Cooper Jul 20 '12 at 22:16
In my case the data looks like [(5, 'text', 4) (3, 'more text', 2)...] so FloatVector gives me an error. – Simd Jul 20 '12 at 22:19

score 2 · Answer 3 · edited May 23 '17 at 12:10

An alternative to rpy2 is to write a mat-file and load this mat-file from R.

in python:

os.chdir("/home/user/proj") #specify a path to save to
import numpy as np
import scipy.io
x = np.linspace(0, 2 * np.pi, 100)
y = np.cos(x)
scipy.io.savemat('test.mat', dict(x=x, y=y))

example copied from: "Converting" Numpy arrays to Matlab and vice versa

in R

library(R.matlab)
object_list = readMat("/home/user/proj/test.mat")

I'm a beginner in python.

score 2 · Answer 4 · answered Sep 27 '17 at 19:26

Suppose that you have a dataframe called data then the following code help me to store this data as a matrix in R and then load it into R (R studio)

save data to R

# Take only the values of the dataframe
B=data.values

import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()

nr,nc = B.shape
Br = ro.r.matrix(B, nrow=nr, ncol=nc)

ro.r.assign("B", Br)
ro.r("save(B, file='here.Rdata')")

Then go to R and write this

load("D:/.../here.Rdata")

This has done the job for me!

Save 2d numpy array to R file format using rpy2

4 Answers4

save data to R

Then go to R and write this