Columns of different types as features

Question

I have data of different types and I want to predict the dependent variable Y from variables A B and C shown below.

    Y       A       B     C
0   11.3914 2.75    0     [0, 0, 10, 17, 35, 26, 0]
1   14.0348 2.50    0     [0, 0, 39, 35, 30, 5, 0]  
2   14.8416 2.75    1     [0, 0, 12, 5, 5, 2, 1]
3   13.7829 2.25    0     [0, 0, 2, 18, 14, 8, 0]   
...

The following attempt gives me ValueError: setting an array element with a sequence. during the fit line.

X = df[['A', 'B', 'C']]
y = df['Y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
tree_reg = DecisionTreeRegressor()  
tree_reg.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

I assumed this was because of the array data in C but when I try to predict with only variables A and B: i.e. X = df[['A', 'B']]

I get another error, this time in the final predict line: ValueError: Number of features of the model must match the input. Model n_features is 7 and input n_features is 2

What am I doing wrong? How can I include each of these features in X?

score 0 · Accepted Answer · answered Jun 16 '19 at 16:50

I think the error in case of using features A and B is due to the last line.

y_pred = regressor.predict(X_test)

It seems that you are using the wrong to predict. You have fit a model named tree_reg and are using another model regressor (maybe used for some previous data) to predict the results. In your case, regressor model accepts 7 feature, by you are providing only 2.

Error when using all the three features A, B and C

When you want to use a list inside a data frame, you cam make use of the tolist() method to convert the list to individual columns of the dataframe.

Split column of lists into multiple columns

Columns of different types as features

1 Answers1