1

I am a newbie in machine learning using python and pandas dataframe. I am training my model and making predictions on the x_test(dataframe). I want to make the predictions for each row(sample) in the x_test and want to append that row to a new dataframe(new_train) if the prediction value is less than some value(0.4). I have provided the body of my idea. Could you please help me out?

 c = XGBRegressor()  
 dt = c.fit(x_train, y_train)

 new_train = pd.DataFrame()  

 for rows in x_test:  
     y_pred = c.predict(x_test[rows])  
     if y_pred < 0.4:
           new_train.append(x_test[rows])

3 Answers3

1

You basically have it already figured out. Just a few tweaks. You can use iloc this way

 for i in range(x_test.shape[0]):  
     row_i = x_test.iloc[i] # a row in x_test
     y_pred = c.predict(row_i)  
     if y_pred < 0.4:
           new_train = new_train.append(row_i)

Or use it this way

 for i in range(len(x_test)):  
     row_i = x_test.iloc[i, :] # a row in x_test
     y_pred = c.predict(row_i)  
     if y_pred < 0.4:
           new_train = new_train.append(row_i)

Both will produce a result of type <class 'pandas.core.series.Series'>

Using the .append() method on a pd.DataFrame object is not an in-place operation. See here for more.

semore_1267
  • 1,327
  • 2
  • 14
  • 29
  • Both are not working as I needed. First one is executing but nothing is getting appended to new_train DataFrame(checked for both y_pred < 0.4 and y_pred >0.4) and second one gives me a value error. Thanks for the effort but could you please figure it out and find where it goes wrong? – Novice_Developer Nov 05 '17 at 01:42
  • Hard to say without more info. All the above answer is doing is: (1) for each row in my dataframe (2) grab that row (3) send that row to my classifier (be careful of types here, arrays vs Series) (4) give me output from the prediction..Follow that step by step and you should be fine – semore_1267 Nov 08 '18 at 03:22
0

I had a very similar problem. When googling, I found this stack overflow page. I know it is an old question since I resolved it, I will answer. Very simple. Here is what I did.

use DataFrame

x_test.iloc[[i]]

instead of series

x_test.iloc[i]
Jun Tanaka
  • 11
  • 2
-1

I think this is what ur looking for,

    for i in range(len(X_test)):  
       row = X_test.iloc[i,:].to_frame().T
       y_pred = forest.predict(row)  
       if y_pred.item(0) < 0.4:
           new_train = new_train.append(row)
RAM
  • 211
  • 1
  • 4
  • 14