0

When I try to fit classifier I got error:

ValueError: could not convert string to float: '4/1/2010'

# Load the Pandas libraries with alias 'pd'
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from math import sqrt
from ml_metrics import rmse

# Read data from file 'filename.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
data = pd.read_csv("NASDAQ.csv")
data.dropna(inplace=True)

#df.drop_duplicates(inplace=True)
nInstances, nAttributes = data.shape
if data.shape[0]:
    train = data[:1762]
    test = data[1762:]
x_train= train.values[:,0:nAttributes-1]
y_train= train.values[:,nAttributes-1]


# classifiers Linear Regression, Logistic Regression, kNN, SVM και MLP
clf = LinearRegression().fit(x_train, y_train)

could you please check this and help me out to figure where is the issue ?

smac89
  • 39,374
  • 15
  • 132
  • 179
spyros
  • 11
  • 2
  • 1
    Can you print the stacktrace you got the error from? Usually, it contains the line in your program that's causing the problem. – Green Cloak Guy May 28 '19 at 19:37
  • The error tells you the issue. Your data contains string date values like `'4/1/2010'`, but sklearn only takes numeric inputs – G. Anderson May 28 '19 at 19:38
  • @G.Anderson how can i convert it to numeric ? i have tried this unsuccessfully ? do you have any idea ? – spyros May 28 '19 at 19:44
  • @GreenCloakGuy File "C:/Users/spyros/Downloads/Dialexi 5/Ασκηση.py", line 26, in clf = LinearRegression().fit(x_train, y_train) ValueError: could not convert string to float: '4/1/2010' – spyros May 28 '19 at 19:45
  • 1
    [This question and answer](https://stackoverflow.com/questions/16453644/regression-with-date-variable-using-scikit-learn) have some good tips and tricks if you want to keep the date as a feature. You could also try removing the date column entirely unless you feel it will help your model without introducing data leakage – G. Anderson May 28 '19 at 19:49
  • Just take the mean of the dates and encode the rest as a distance (in days, hours, minutes, ... or using the timestamp) from the median or any other date. In this way you get a continuous values. – OSainz May 28 '19 at 20:11
  • Thank you all. My professor did a mistake on the project .ge corrected today so i figure it out .thank you all – spyros May 29 '19 at 18:09

1 Answers1

0

Specifically for the date column, you can use parse_dates=['column name'] when reading the csv as suggested by this answer.

Abhineet Gupta
  • 624
  • 4
  • 12