-1

Hello I have this code:

import pandas as pd
import numpy as np
import warnings
from sklearn import svm
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
from sklearn.model_selection import train_test_split

df = pd.read_csv("datatrain.csv" , sep="," ,encoding = 'windows-1250' )

df = df[['FEATURE1' ,  'FEATURE2' , 'FEATURE3' ,'LABEL']]

df.dropna(inplace=True)
print(df.head())

X = np.array(df.drop(['LABEL'], 1))
y = np.array(df['LABEL'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

clf = svm.SVC(kernel="linear", C= 1.0)
clf.fit(X_train[:-500], y_train[:-500])    

accuracy = clf.score(X_test, y_test)

print("accuracy: ", accuracy)

My dataset is big, more then 150K lines, but as you see I'm only use the first 500 line. When I start my code the first print(df.head())runs, but then I got only a bouncing python rocket on my dock, and nothing happens.

Can you tell me why is that? Thank you!

bouncing python icon

solarenqu
  • 804
  • 4
  • 19
  • 44

1 Answers1

0

You are using all lines except the last 500 lines. It should be clf.fit(X_train[:500], y_train[:500]).

See this answer for a detailed explanation on how to get n-th last element from a slice.

Community
  • 1
  • 1
Maximilian Peters
  • 30,348
  • 12
  • 86
  • 99