Obese is a binary response var with 1 indicating obese and 0 not obese. Weight is a continuous predictor.
using a RF to classify obese:
library(randomFores)
rf <- randomForest(factor(obese)~weight)
gives us a fit object containing:
> summary(rf)
Length Class Mode
call 2 -none- call
type 1 -none- character
predicted 100 factor numeric
err.rate 1500 -none- numeric
confusion 6 -none- numeric
votes 200 matrix numeric
oob.times 100 -none- numeric
classes 2 -none- character
importance 1 -none- numeric
importanceSD 0 -none- NULL
localImportance 0 -none- NULL
proximity 0 -none- NULL
ntree 1 -none- numeric
mtry 1 -none- numeric
forest 14 -none- list
y 100 factor numeric
test 0 -none- NULL
inbag 0 -none- NULL
terms 3 terms call
I believe the votes matrix shows how many votes, from 0 to 1, the rF gives to classifying each case to either class; not obese = 0, obese = 1:
> head(rf$votes, 20)
0 1
1 0.9318182 0.06818182
2 0.9325843 0.06741573
3 0.2784091 0.72159091
4 0.9040404 0.09595960
5 0.3865979 0.61340206
6 0.9689119 0.03108808
7 0.8187135 0.18128655
8 0.7170732 0.28292683
9 0.6931217 0.30687831
10 0.9831461 0.01685393
11 0.3425414 0.65745856
12 1.0000000 0.00000000
13 0.9728261 0.02717391
14 0.9848485 0.01515152
15 0.8783069 0.12169312
16 0.8553459 0.14465409
17 1.0000000 0.00000000
18 0.3389831 0.66101695
19 0.9316770 0.06832298
20 0.9435897 0.05641026
taking those:
votes_2 <- rf$votes[,2]
votes_1 <- rf$votes[,1]
my question is why do:
pROC::plot.roc(obese, votes_1)
and
pROC::plot.roc(obese, votes_2)
produce the same result.