I am using the randomForest
library in R via RPy2
. I would like to pass back the values calculated using the caret
predict
method and join them to the original pandas
dataframe. See example below.
import pandas as pd
import numpy as np
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
r = robjects.r
r.library("randomForest")
r.library("caret")
df = pd.DataFrame(data=np.random.rand(100, 10), columns=["a{}".format(i) for i in range(10)])
df["b"] = ['a' if x < 0.5 else 'b' for x in np.random.sample(size=100)]
train = df.ix[df.a0 < .75]
withheld = df.ix[df.a0 >= .75]
rf = r.randomForest(robjects.Formula('b ~ .'), data=train)
pr = r.predict(rf, withheld)
print pr.rx()
Which returns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
a a b b b a a a a b a a a a a b a a a a
Levels: a b
But how can join
this to the withheld
dataframe or compare to the original values?
I have tried this:
import pandas.rpy.common as com
com.convert_robj(pr)
But this returns a dictionary where the keys are strings. I guess there is a work around of withheld.reset_index()
and then converting the dict keys to integers and then joining the two but there must be a simpler way!