1

Code:

import csv
import numpy
raw_data = open('C:\\Users\\train.csv', 'rt')
data = numpy.loadtxt(raw_data, delimiter=",")
print(data.shape)

Below is the sample data used

Time    Freq
8:00    91.1
8:03    91.1
8:06    91.1
8:09    91.1
8:12    91.1
8:15    91.1
8:18    91.1
8:21    91.1
8:24    91.1
8:27    91.1
8:30    91.1

Error:
ValueError: could not convert string to float: b'Time'
hpaulj
  • 221,503
  • 14
  • 230
  • 353
Ak Ash
  • 11
  • 1
  • 3
  • What is the question? The error/exception is pretty unambiguous. Does `numpy.loadtext` have an optional parameter telling it to skip a header line? It isn't clear from your sample data that the first two words are on their own line. Please copy and paste the sample data and format it as code (select it and pres `ctrl-k`). – wwii Apr 26 '18 at 20:11
  • As a default `loadtxt` loads the data as floats, and raises an error when it can't. `genfromtxt` puts `nan` where it can't create the float. What do you want the result to look like? – hpaulj Apr 26 '18 at 20:22

2 Answers2

2
In [350]: txt ='''Time    Freq
     ...: 8:00    91.1
     ...: 8:03    91.1
     ...: 8:06    91.1
     ...: 8:09    91.1
     ...: 8:12    91.1
     ...: 8:15    91.1
     ...: 8:18    91.1
     ...: 8:21    91.1
     ...: 8:24    91.1
     ...: 8:27    91.1
     ...: 8:30    91.1
     ...: '''

Loading as a structured array, using the first line as field names.

In [351]: data = np.genfromtxt(txt.splitlines(),names=True,dtype=None,encoding=N
     ...: one)
In [352]: data
Out[352]: 
array([('8:00', 91.1), ('8:03', 91.1), ('8:06', 91.1), ('8:09', 91.1),
       ('8:12', 91.1), ('8:15', 91.1), ('8:18', 91.1), ('8:21', 91.1),
       ('8:24', 91.1), ('8:27', 91.1), ('8:30', 91.1)],
      dtype=[('Time', '<U4'), ('Freq', '<f8')])
In [353]: data['Freq']
Out[353]: array([91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1, 91.1])

Note that the 2nd column has been loaded as numbers, but the first as strings.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

By default numpy.loadtext expects everything in the file to be a number. Time is not a number. 8:00 is not a number either. If you want to perform numerical operations on your data, you're going to need to remove the Time Freq header, and convert your times to numbers.

If you don't need to do any type of numerical analysis, you can import the data as strings. numpy.loadtxt(raw_data, delimiter=",", dtype='str') . See the docs for more info.


Alternatively, you can use genfromtxt.

hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51