I'm Working on a Machine learning (Data-Mining) project and i'm done with data exploration and data preparation step and it was done in python!
Now I'm facing this issue : i have categoricals attributes in my dataset . After research i've found that the best appropriate algorithm for that kind of data is a decision tree or a random forrest classifier !
But I've read some similar questions about decision tree and categorical attribute and found that the library I'm using (scikit-learn) doesn't works with categoricasl attributes . check here and here , for making it work with categorical i need to encode my categorical variables into numerical ones but i don't want to use encoding because i will loose some properties of my attributes and some informations according to this answer , and also some of my attributes has more than 100 different values.
So I want to know :
- is there any other python library that can build decision trees with categorical data without any encoding?
- in this answer it was suggest that other libraries like WEKA can build decisions trees with categorical attributes so my question is this can I combine 2 language in the same machine learning project?
Will do data exploration and preparation in python, train the model in weka (java), and deploy it in a python-flask web app? can it be possible?