I've been trying to learn Python 3.6 using the Anaconda distribution. I've hit a snag with the content of the online course I'm using, and could use some help working through some error messages. I'd ask the instructors of the course, but they don't seem very responsive to questions from students.
I've been having some trouble working with the three dominant classes used to recode categorical data. As I understand it, there are three classes drawn from the scikitlearn package used for recoding variables: LabelEncoder, OneHotEncoder and LabelBinarizer. I have attempted to employ each to recode a categorical variable inside a dataset, but keep getting errors for each.
Please pardon my relative noobness for the samples codes. As one might have guessed by the baseness of my question, I am not well versed in python.
The object X contains a few columns, the first being a categorical string I need to convert (If someone could also tell me how to insert tables, that'd be helpful. Do I have to use HTML?):
"Fish" 1 5 3
"Dog" 2 6 9
"Dog" 8 8 8
"Cat" 5 7 6
"Cat" 6 6 6
Label Encoder Attempt
Below is the code I attempted to implement, and the resulting error message I received for the object X, which has roughly the properties I described above.
from sklearn.preprocessing import LabelEncoder
labelencoder_X =LabelEncoder
X[:, 0] = LabelEncoder.fit_transform(X[:, 0])
TypeError: fit_transform() missing 1 required positional argument: 'y'
What is throwing me is I thought the above code was clearly defining what y is, the first column of X.
OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform[X].toarray()
TypeError: 'method' object is not subscriptable
Label Binarizer
I've found this one the hardest to understand, and actually couldn't make an attempt based on the structure of the dataset.
Any guidance or suggestions you could provide would be endlessly helpful.