I am currently re-visiting a random forests project I performed a few years back using the R-language, to:
- generate a proximity matrix of the data inputs using unsupervised RandomForest
- calculate the distance matrix from this proximity matrix and pass to Partitioning Around Medoids (PAM) clustering algorithm
- using the clusters obtained through PAM, run RandomForest in supervised mode to train a new model.
- Use this model to predict using another dataset from a future point in time.
I have shifted my workflow to Python for much of many projects as the language is very flexible and fun, but I am still getting my bearings in sklearn as compared to how I performed such tasks in R. My hangup is in producing a proximity matrix (or some container holding the proximity between samples), to be passed to PAM. I have found the following post, which describes a similar issue, but I have been unable to find a way to implement what the accepted answer's author suggests.
Any clues as to how to implement this? Any help is be greatly appreciated, and I will be sure to return that to the larger community. I know there are lots of other R to Python converts out there who would benefit from this sort of information.
Thanks in advance and apologies if this is a simple solution that I am simply overlooking.