Rpy2 Subset data frame

Question

I would like to subset COL_TWO where COL_ONE is 'A' from my rdaData data frame loaded from .rda file in Python.

rdaData:

COL_ONE   COL_TWO
A         12
B         10
A         80

The code:

import rpy2.robjects import r

r.load(path/to/files.rda)
substr_data = r.subset(r('rdaData'), COL_ONE == 'A', select = 'COL_TWO')

When running COL_ONE == 'A' I got this error message:

COL_ONE not defined.

I understand that COL_ONE is treated as Python variable instead of R. So I tried:

substr_data = r.subset(r('rdaData'), 'COL_ONE' == 'A', select = 'COL_TWO')

and

substr_data = r.subset(r('rdaData'), r('rdaData$COL_ONE')== 'A', select = 'COL_TWO')

Both COL_TWO returned no data. I tried the code in r, it returned 12 and 80.
Where did I go wrong?

score 0 · Answer 1 · edited May 23 '17 at 12:24

I'm not sure if you're still after this. Here's what I tried:

import numpy as np
from rpy2.robjects import r
rnorm=r.rnorm
cbind=r.cbind
DataFrame=r('data.frame')

# Generating a matrix of some data
n_col =5
xa = r('matrix')(rnorm(30, 1), ncol = n_col)
ya = r('c(1, seq(5))')
ya = r('LETTERS[seq( from = 1, to = 5 )]')
x = cbind(xa,ya)

# Convert matrix to DataFrame (DF)
xa = DataFrame(x)
print xa

# Specify the DF column name used in selecting/slicing. eg.: 'COL_ONE' or 'X6':
new_col = 'X'+str(n_col+1)
# Slice (or subset()) the R DF:
substr_data = xa.rx(xa.rx2(new_col).ro == 'A', True)
print substr_data
# Specify the column name needed from the sliced DF. eg. 'COL_TWO' or 'X5'
print substr_data.rx2('X5')

OUTPUT:

> print xa
                 X1               X2                  X3                  X4
1  -0.4800320535369 2.68521681727218   0.623809846227243   0.810425086281231
2  1.54793994282147 1.39236531245408  -0.424155538823749  -0.242790003122539
3  1.39902476009121 1.23852817727937   0.934250526437131   0.789340066231089
4 0.903770245650284 2.06828848578716 -0.0602365472425763 -0.0602786816600411
5  2.06232894261465 1.39008580471573  -0.800324172073538   0.348292765491598
6 0.475607302003817 2.11744661073875    1.25253406148531  -0.276489137947105
                 X5 X6
1 -1.38012532899756  A
2 -1.70992738271866  B
3   1.7841406565434  C
4  -0.9296857462388  D
5 0.805075070886426  E
6 0.815799142148484  A

> print substr_data
                 X1               X2                X3                 X4
1  -0.4800320535369 2.68521681727218 0.623809846227243  0.810425086281231
6 0.475607302003817 2.11744661073875  1.25253406148531 -0.276489137947105
                 X5 X6
1 -1.38012532899756  A
6 0.815799142148484  A

> print substr_data.rx2('X5')
[1] -1.38012532899756 0.815799142148484
6 Levels: -0.9296857462388 -1.38012532899756 ... 1.7841406565434

The final answer is a FactorVector...I'm not sure how to get the numbers out of it (as, for exampple, a list) but the DF slicing part of this approach seems feasible to address your question.

Links used:

Rpy2 Subset data frame

1 Answers1