1

I created the below table in Google Sheets and downloaded it as a CSV file.

enter image description here

My code is posted below. I'm really not sure where it's failing. I tried to highlight and run the code line by line and it keeps throwing that error.

# Data Preprocessing

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:5 ])
X[:, 1:6] = imputer.transform(X[:, 1:5])

The error I'm getting is:

Could not convert string to float: 'Illinois'

I also have this line above my error message

array = np.array(array, dtype=dtype, order=order, copy=copy)

It seems like my code is not able to read my GPA column which contains floats. Maybe I didn't create that column right and have to specify that they're floats?

*** I'm updating with the full error message:

     [15]: runfile('/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py', wdir='/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing')
Traceback (most recent call last):

  File "<ipython-input-15-5f895cf9ba62>", line 1, in <module>
    runfile('/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py', wdir='/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing')

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
    execfile(filename, namespace)

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py", line 16, in <module>
    imputer = imputer.fit(X[:, 1:5 ])

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/imputation.py", line 155, in fit
    force_all_finite=False)

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 433, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: 'Illinois'
wolfbagel
  • 468
  • 2
  • 11
  • 21
  • Use X[:,2:] as float values are from 3rd column onwards – bigbounty Dec 21 '17 at 03:01
  • Why not put the line that generates the error in your question? – JMA Dec 21 '17 at 03:04
  • *"I'm really not sure where it's failing. [...] The error I'm getting is [...]"* Please include the complete traceback (i.e. the complete error message) in the question. It will tell you where the code is failing. – Warren Weckesser Dec 21 '17 at 03:15
  • Hi @WarrenWeckesser I've updated my post with the full error. Thank you. – wolfbagel Dec 21 '17 at 03:57
  • @newcoder you still haven't pasted the error message fully. I recreated your case and ran it to see the full error message. Please see my answer. – FatihAkici Dec 21 '17 at 04:00
  • @bigbounty I've been following a tutorial and they use X[:, 1:5 ]. I still need Index 0 and 1 from my column to be in X, so if I use X[:,2:], won't that exclude them? – wolfbagel Dec 21 '17 at 04:00

2 Answers2

3

Actually the full error you are getting is this (which would help tremendously if you pasted it in full):

Traceback (most recent call last):

  File "<ipython-input-7-6a92ceaf227a>", line 8, in <module>
    imputer = imputer.fit(X[:, 1:5 ])

  File "C:\Users\Fatih\Anaconda2\lib\site-packages\sklearn\preprocessing\imputation.py", line 155, in fit
    force_all_finite=False)

  File "C:\Users\Fatih\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: Illinois

which, if you look carefully, points out where it is failing:

imputer = imputer.fit(X[:, 1:5 ])

which is due to your effort in taking mean of a categorical variable, which, doesn't make sense, and

which is already asked and answered in this StackOverflow thread.

FatihAkici
  • 4,679
  • 2
  • 31
  • 48
  • Ok thank you, will make sure to post the whole error next time. – wolfbagel Dec 21 '17 at 04:06
  • @newcoder Humbly, I highly recommend that you run your script one line at a time, as opposed to running the entire script at once, during prototyping or development or learning. That way you can be in full charge of what each line of code is **actually** doing, and it also makes debugging so easier. I am glad I was able to help! – FatihAkici Dec 21 '17 at 04:10
-2

Change the line:

dataset = pd.read_csv('Data2.csv')

by:

dataset = pd.read_csv('Data2.csv', delimiter=";")
Daria Pydorenko
  • 1,754
  • 2
  • 18
  • 45