I've used sklearn for machine learning modelling over the last couple of years and grew accustomed to what seems like a very logical and cohesive framework:
from sklearn.ensemble import RandomForestClassifier
# define a model
clf = RandomForestClassifier()
# fit the model to data
clf.fit(X,y)
#make prediction on a test set
preds = clf.predict_proba(X_test)[:,1]
I'm now trying to learn some R, and want to start doing some of the same things I was doing in sklearn. The first thing that you notice coming from the sklearn world is the diverse syntax across packages. Which is understandable, but kind of inconvenient. caret seems like a nice solution to that problem, creating cohesion across all the different R packages (i.e. randomForest, gbm,...). Though I'm still puzzled by some of default choices (i.e. the train() method seems to default to some sort of grid search). Also, caret seems to be using plyr behind the scenes, which messes up some of dplyr methods like summarise. Since I do lots of data manipulation with dplyr that's kind of a problem. Can you help me figure out what the caret's equivalent of the sklearn's model/fit/predict_proba is? Also, is there a way to deal with the plyr/dplyr issue?