1

I am testing and training text dataset but getting this error. CSV file contains texts.

When I run the code, it gives the output:

ValueError: could not convert string to float: b'user1'

and here user1 is a text inside a dataset

Code:

from keras.models import Sequential
from keras.layers.core import Dense
from sklearn.model_selection import train_test_split
import numpy as np


seed = 9
np.random.seed(seed)

dataset = np.loadtxt('E:/7th Semester/FYP/ini/New 
folder/MBAT/DataSet/train_data.csv', delimiter=',', skiprows=1)


X = dataset[:,0:8]
Y = dataset[:,8]

(X_train, X_test, Y_train, Y_test) = train_test_split(X, Y, test_size=0.33, 
random_state=seed)


model = Sequential()
model.add(Dense(8, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(6, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))


model.compile(loss='binary_crossentropy', optimizer='adam', metrics= 
['accuracy'])
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), nb_epoch=100, 
batch_size=5)

scores = model.evaluate(X_test, Y_test)
print ("Accuracy: %.2f%%" %(scores[1]*100))

Complete Traceback error:

File "C:\Users\Lenovo\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 725, in floatconv
    return float(x)

ValueError: could not convert string to float: b'user1'
amanb
  • 5,276
  • 3
  • 19
  • 38
dashti
  • 11
  • 4
  • 1
    Hi @dashti, Can you share the full `Traceback` error? This will help identify which part of code is causing the error. – amanb Dec 15 '18 at 11:07
  • hi @amanb , File "C:\Users\Lenovo\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 725, in floatconv return float(x) ValueError: could not convert string to float: b'user1' – dashti Dec 15 '18 at 11:25
  • 1
    @dashti The **FULL** backtrace, and please in the question, not as a comment. – Matthieu Brucher Dec 15 '18 at 11:32
  • i just added traceback in question. @MatthieuBrucher – dashti Dec 15 '18 at 11:36
  • Doesn't a complete Traceback always start with `Traceback`? – user8408080 Dec 15 '18 at 12:26
  • Without a sample of the `csv` text file we can't help. The discussion indicates that this file not only has string columns, but has variable length rows, or missing values. – hpaulj Dec 15 '18 at 17:05

1 Answers1

0

According to the official documentation for numpy, the dtype for the resulting array from numpy.loadtxt() is float. Now, user1 is a string and cannot be converted to float, and therefore you are getting this error. You could try the following:

np.genfromtxt('/path/to/csv', dtype=None, delimiter=',', names=True, case_sensitive=True, invalid_raise=False)
amanb
  • 5,276
  • 3
  • 19
  • 38
  • I just added another approach using `np.genfromtxt()`. You could try that. – amanb Dec 15 '18 at 11:48
  • got this error when tried np.genformtxt() ValueError: Some errors were detected ! Line #5 (got 5 columns instead of 4) Line #8 (got 2 columns instead of 4) Line #11 (got 6 columns instead of 4) – dashti Dec 15 '18 at 11:59
  • If your csv has mixed dtypes, you should use `dtype=None`, I've edited the answer. Also, the error you are getting is due to the inconsistency detected in the number of columns. This [SO answer](https://stackoverflow.com/questions/23353585/got-1-columns-instead-of-error-in-numpy) suggests a workaround which I'm adding to my answer. – amanb Dec 15 '18 at 12:02
  • Please refer to suggested answer to resolve column inconsistencies. I've added the argument `invalid_raise=False` to my answer. However, unless we are not sure what type of data exists in the csv, we cannot suggest what may work for you. – amanb Dec 15 '18 at 12:10
  • i have int and string data in csv file – dashti Dec 15 '18 at 12:23
  • Y = dataset[:,8] from this line code got this error now. IndexError: index 8 is out of bounds for axis 1 with size 4 – dashti Dec 15 '18 at 12:26
  • 1
    This most likely means there are 4 columns, but you have specified index 8 which is out of bounds. Change that to 4. – amanb Dec 15 '18 at 12:41