Could not convert string to float while using numpy.loadtxt

Question

Code:

import csv
import numpy
raw_data = open('C:\\Users\\train.csv', 'rt')
data = numpy.loadtxt(raw_data, delimiter=",")
print(data.shape)

Below is the sample data used

Time    Freq
8:00    91.1
8:03    91.1
8:06    91.1
8:09    91.1
8:12    91.1
8:15    91.1
8:18    91.1
8:21    91.1
8:24    91.1
8:27    91.1
8:30    91.1

Error:
ValueError: could not convert string to float: b'Time'

What is the question? The error/exception is pretty unambiguous. Does `numpy.loadtext` have an optional parameter telling it to skip a header line? It isn't clear from your sample data that the first two words are on their own line. Please copy and paste the sample data and format it as code (select it and pres `ctrl-k`). — wwii, Apr 26 '18 at 20:11
As a default `loadtxt` loads the data as floats, and raises an error when it can't. `genfromtxt` puts `nan` where it can't create the float. What do you want the result to look like? — hpaulj, Apr 26 '18 at 20:22

score 2 · Answer 1 · answered Apr 26 '18 at 20:34

In [350]: txt ='''Time    Freq
     ...: 8:00    91.1
     ...: 8:03    91.1
     ...: 8:06    91.1
     ...: 8:09    91.1
     ...: 8:12    91.1
     ...: 8:15    91.1
     ...: 8:18    91.1
     ...: 8:21    91.1
     ...: 8:24    91.1
     ...: 8:27    91.1
     ...: 8:30    91.1
     ...: '''

Loading as a structured array, using the first line as field names.

In [351]: data = np.genfromtxt(txt.splitlines(),names=True,dtype=None,encoding=N
     ...: one)
In [352]: data
Out[352]: 
array([('8:00', 91.1), ('8:03', 91.1), ('8:06', 91.1), ('8:09', 91.1),
       ('8:12', 91.1), ('8:15', 91.1), ('8:18', 91.1), ('8:21', 91.1),
       ('8:24', 91.1), ('8:27', 91.1), ('8:30', 91.1)],
      dtype=[('Time', '<U4'), ('Freq', '<f8')])
In [353]: data['Freq']
Out[353]: array([91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1])

Note that the 2nd column has been loaded as numbers, but the first as strings.

hostingutilities.com · Answer 2 · 2018-04-26T20:30:05.800

By default numpy.loadtext expects everything in the file to be a number. Time is not a number. 8:00 is not a number either. If you want to perform numerical operations on your data, you're going to need to remove the Time Freq header, and convert your times to numbers.

If you don't need to do any type of numerical analysis, you can import the data as strings. numpy.loadtxt(raw_data, delimiter=",", dtype='str') . See the docs for more info.

Alternatively, you can use genfromtxt.

Could not convert string to float while using numpy.loadtxt

2 Answers2