I'm using R's MatchIt
package via Python's rpy2
package. I transfer results from R to Python. While this transfer I lose the names of rows and columns, but only in a specific situtaion. And I would like to understand what is the diferecence here.
R code
First of all please let me show the original R script. But keep in mind this script is not executed by Python. The rpy2
package (see in next section) use a different approach to use R stuff. The two Variants you can see in that code are relevant in the next section.
library("MatchIt")
data("lalonde")
# simplify
lalonde = lalonde[,c("treat", "age", "race", "married")]
# matching
match_out <- matchit(
treat ~ age + race + married,
data = lalonde,
method = "nearest",
distance = "glm"
)
## Variant A
balance_A <- result <- as.data.frame(summary(match_out)$sum.matched)
## Variant B
sum_matched <- summary(match_out)$sum.matched
balance_B <- as.data.frame(sum_matched)
The objects balance_A
and balance_B
are equal and look like this.
> balance_A
Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
distance 0.56610932 0.3620326 0.9661981 0.6473161 0.13317246 0.4000000 0.9687231
age 25.81621622 28.1027027 -0.3195640 0.4220499 0.08527027 0.1621622 1.1687127
raceblack 0.84324324 0.4702703 1.0258593 NA 0.37297297 0.3729730 1.0258593
racehispan 0.05945946 0.3135135 -1.0743033 NA 0.25405405 0.2540541 1.3028784
racewhite 0.09729730 0.2162162 -0.4012621 NA 0.11891892 0.1189189 0.4742189
married 0.18918919 0.2918919 -0.2622249 NA 0.10270270 0.1027027 0.6762642
Python code
Here you see the same approach in Python code using rpy2
package.
#!/usr/bin/env python3
import rpy2
from rpy2.robjects.packages import importr, data
import rpy2.robjects as robjects
import rpy2.robjects.pandas2ri as pandas2ri
import pydataset
if __name__ == '__main__':
# For converting objects from/into Pandas <-> R
# Credits: https://stackoverflow.com/a/20808449/4865723)
pandas2ri.activate()
# import
matchit_pkg = robjects.packages.importr('MatchIt')
# data
df = robjects.r('''
library(MatchIt)
data(lalonde)
return(lalonde)
''')
df = df.loc[:, ['treat', 'age', 'race', 'married']]
# get match object
match_out = robjects.r['matchit'](
formula=robjects.Formula('treat ~ age + race + married'),
data=df,
method='nearest',
distance='glm')
## Variant A
print('\n-- Variant A --')
get_balance_dataframe = robjects.r('''f <- function(match_out) {
result <- as.data.frame(summary(match_out)$sum.matched)
return(result)
}
''')
balance_A = get_balance_dataframe(match_out)
balance_A = robjects.conversion.rpy2py(balance_A)
print(balance_A) # <--- OK
## Variant B
print('\n-- Variant B --')
get_sum_matched = robjects.r('''f <- function(match_out) {
result <- summary(match_out)$sum.matched
return(result)
}
''')
sum_matched = get_sum_matched(match_out)
print(sum_matched) # <--- Looks like a matrix
matrix_to_dataframe = robjects.r('''f <- function(a_matrix) {
result <- as.data.frame(a_matrix)
return(result)
}''')
balance_B = matrix_to_dataframe(sum_matched)
balance_B = robjects.conversion.rpy2py(balance_B)
print(balance_B) # <--- Names of rows and columns lost
Output
Variant A is OK
This seems OK.
-- Variant A --
Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
distance 0.560643 0.378393 0.898469 0.689696 0.132819 0.400000 0.902191
age 25.816216 28.016216 -0.307476 0.418415 0.086622 0.162162 1.316785
race 1.254054 1.729730 -0.765436 0.643151 0.158559 0.372973 0.765436
married 0.189189 0.308108 -0.303629 NaN 0.118919 0.118919 0.607258
Variant B has a problem
Here the names of columns and rows are lost.
V1 V2 V3 V4 V5 V6 V7
1 0.560643 0.378393 0.898469 0.689696 0.132819 0.400000 0.902191
2 25.816216 28.016216 -0.307476 0.418415 0.086622 0.162162 1.316785
3 1.254054 1.729730 -0.765436 0.643151 0.158559 0.372973 0.765436
4 0.189189 0.308108 -0.303629 NaN 0.118919 0.118919 0.607258