0

I am reading csv data for building model.

I do understand missing values processing so I haved filled them using radiun and zero. And dropped few parameters which are of no interest.

I manually checked csv file applying filter for empty value. Which ever fields give empty, I tried to fill them. But still I am getting above error.

Here is my code -

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

dataset = pd.read_csv("model__newdata.csv",header = 0)

#Data Pre-processing
data = dataset.drop('shift_location_id',1)
data = data.drop('status',1)
data = data.drop('city',1)
data = data.drop('open_positions',1)

#Find median for features having NaN
median_role_id, median_specialty_id = data['role_id'].median(),data['specialty_id'].median() 
median_shift_id = data['shift_id'].median()
median_shift_id = data['specialty_id'].median()

data['shift_id'].fillna(median_shift_id, inplace=True)
data['role_id'].fillna(median_role_id, inplace=True)
data['specialty_id'].fillna(median_specialty_id, inplace=True)
data['years_of_experience'].fillna(0, inplace=True)
data['specialty_id'].fillna(0, inplace=True)

#Start training

labels = dataset.shift_location_id
train1 = data
algo = LinearRegression()
x_train , x_test , y_train , y_test = train_test_split(train1 , labels , test_size = 0.20,random_state =1)

# x_train.to_csv("x_train.csv", sep=',', encoding='utf-8')
# x_test.to_csv("x_test.csv", sep=',', encoding='utf-8')

algo.fit(x_train,y_train)
algo.score(x_test,y_test)

Error:

ValueError                                Traceback (most recent call last)
<ipython-input-27-99f96096832a> in <module>
     32 # x_test.to_csv("x_test.csv", sep=',', encoding='utf-8')
     33 
---> 34 algo.fit(x_train,y_train)

ValueError: could not convert string to float: 'none'

Any suggestion how to resolve this?

Edit 1 - Sample data - https://gist.githubusercontent.com/karimkhanvi/d69c98352aaaaed87f787a20c05307f8/raw/a45bb471fc1ee5095a1d0c3809a8362c001f639e/temp.csv

Edit 2 - I already checked ValueError: could not convert string to float: id before I posted.

I appreciate if you check that I have not issue with the data type of any parameter.

ValueError: could not convert string to float: 'none'

I am facing issue due to empty values. And I have tried to deal with this issue which does not solve my problem. That is why I have posted this question.

Edit 3 I tried to check if any value isnull

data.isnull().values.any()
data.isnull().sum()

Which gives false and

shift_id                 0
user_id                  0
shift_organization_id    0
shift_department_id      0
role_id                  0
specialty_id             0
years_of_experience      0
nurse_zip                0
shifts_zip               0
dtype: int64
user2129623
  • 2,167
  • 3
  • 35
  • 64
  • 1
    https://stackoverflow.com/questions/8420143/valueerror-could-not-convert-string-to-float-id – E.Serra Mar 18 '19 at 12:02
  • 1
    Show your CSV data – Alderven Mar 18 '19 at 12:08
  • @Alderven: Please check link in edit – user2129623 Mar 18 '19 at 12:43
  • Possible duplicate of [ValueError: could not convert string to float: id](https://stackoverflow.com/questions/8420143/valueerror-could-not-convert-string-to-float-id) – ivan_pozdeev Mar 18 '19 at 12:45
  • @ivan_pozdeev: please check edit 2 – user2129623 Mar 18 '19 at 13:03
  • WFM with your data and code if I use `dataset = pd.read_csv("model__newdata.csv",header = 0, dialect = 'excel-tab')` (since the csv you gave is tab-separated). Python 2.7.15 win64, latest `pandas`, `numpy`, `scipy` and `sklearn` (installed with `--force-reinstall`). – ivan_pozdeev Mar 18 '19 at 14:02
  • @ivan_pozdeev: When i add `excel-tab`, it says `KeyError: "['shift_location_id'] not found in axis"` while reading first value. Also this is purely csv file. not excel converted – user2129623 Mar 18 '19 at 14:42

0 Answers0