4

I have a number of saved R datasets (though they have .R extensions). I can access one of these matrices fname.R with R, using load("fname.R") and then entering the filename fname.

However, I would like to use this matrix in Python. I could use rpy2 to import the data, but I am interested in manipulating this data as well. How can I turn it into a Python matrix?

David Robinson
  • 77,383
  • 16
  • 167
  • 187
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • This might help: http://stackoverflow.com/questions/21288133/loading-rdata-files-into-python – C_Z_ Aug 27 '15 at 16:55
  • What kind of variables are stored in these files (that is to say, the `fname` variables?) Data frames? Vectors? Lists? Something else? How you approach them will depend on the data type. Also, note that while these were apparently saved in R, it is more typical to save them as ".rda" or ".RData" (the extension doesn't matter, but knowing the convention will help with your Googling for solutions) – David Robinson Aug 27 '15 at 17:04
  • @DavidRobinson They look to me like 5000 matrices, each with 25 values. – ShanZhengYang Aug 27 '15 at 17:21
  • You mean that there are 5000 files, each one containing a single matrix? Just to check this, could you do `load("fname.R"); dput(fname)` in R, and edit the result into your question? This would create a reproducible version of that object. (Don't do this if your file contains sensitive or personalized information that you're not comfortable being public online) – David Robinson Aug 27 '15 at 17:23
  • @DavidRobinson Edited above. I'm not sure how `R` would exactly categorize this data. A long vector? – ShanZhengYang Aug 27 '15 at 17:46
  • That's a matrix (note the dimensions specified at the end, 4000 by 3023). Do you mind if I edit your question title/body to clarify this? That would help make it clearer what you're asking and help get an answer. I might be able to answer it as well. – David Robinson Aug 27 '15 at 17:48
  • @DavidRobinson Edit away, you have my permission. Thanks! – ShanZhengYang Aug 27 '15 at 19:10

1 Answers1

6

You can find the solution in two other Stack Overflow questions/answers: this shows how to load a variable from an RData file, and this shows how to convert an R matrix to a numpy array.

Combined, the solution looks like this:

import rpy2.robjects as robjects
import numpy as np

# load your file
robjects.r['load']('fname.RData')

# retrieve the matrix that was loaded from the file
matrix = robjects.r['fname']

# turn the R matrix into a numpy array
a = np.array(matrix)

print a

For instance, if you'd started by running the following code in R:

fname <- matrix(1:9, nrow = 3)
save(fname, file = "fname.RData")

The above Python code would print:

[[1 4 7]
 [2 5 8]
 [3 6 9]]
Community
  • 1
  • 1
David Robinson
  • 77,383
  • 16
  • 167
  • 187