I'm trying to convert specific columns from string to float in Python but I'll always end up with the error:
cannot convert string to float: 'Tour Delay Minutes'
Tour Delay Minutes is the name of a specific column and contains values such as 6.31 or integers (in case the result is an integer) like 9,10 etc. My code is:
import pandas as pd
import numpy as np
data = pd.read_csv('H:\\testing.csv',thousands = ',')
data.drop([0], axis=1) #Removes the header? based on another post
cols=['Tour Delay Minutes','Passenger Delay Minutes','Driver Delay Minutes','Engine Failures','Vehicle Failures'] #Columns containing ints and floats
for col in cols: #Loop to transform all column strings to floats by default
data[col]= data[col].astype(dtype=np.float64)
data.info()
The dtypes specified at point of loading are:
Unnamed: 0 int64
Time Period object #contains day,midday,early afternoon
Tour Number object #contains integers
Tour Delay Minutes object #contains float numbers
Passenger Delay Minutes object #contains float numbers
Driver Delay Minutes object #contains float numbers
Engine Failures object #contains integer numbers
Vehicle Failures object #contains integer numbers
I suppose the error will also apply for all other columns marked as object (as shown above) and that's because Python tries to convert the header as well (row 1). Any workarounds on this please? I've also tried the below code, however it didn't work:
data['Tour Delay Minutes'].astype(str).astype(float)
Edit: Sample dataset added to assist in finding the solution - PLease see link:
https://i.stack.imgur.com/o4zcX.png
Unnamed: 0 (index) Time Period Tour Number Tour Delay Minutes Passenger Delay Minutes Driver Delay Minutes Engine Failures Vehicle Failures
0 2018/19-P08 261803 11 6 5 2 0
1 2018/19-P08 325429 16 12 4 0 0
2 2018/19-P08 359343 14 5 9 0 0
3 2018/19-P08 366609 18 10 8 0 0
4 2018/19-P08 370697 63 37 26 2 0
5 2018/19-P08 392535 1474 140 1334 37.1194012 0.022591857
6 2018/19-P09 394752 0 0 0 0 0
7 2018/19-P09 408713 31 13 18 1.25 0
8 2018/19-P09 433763 62 49 13 4.766666667 1
9 2018/19-P09 440100 0 0 0 1 1
10 2018/19-P09 440258 17 14 3 1 0
11 2018/19-P10 440280 46 46 0 2.933333333 2
12 2018/19-P10 440929 22 7 15 1 0
13 2018/19-P10 441110 26 13 13 0 0
14 2018/19-P10 441585 4 0 4 0 0
15 2018/19-P10 442092 39 12 27 1.923076923 0
16 2018/19-P11 442105 0 0 0 0 0
17 2018/19-P11 442173 3 0 3 0 0
18 2018/19-P11 443580 4 2 2 0.428571429 0
19 2018/19-P11 443594 3 2 1 0.285714286 0
20 2018/19-P12 443599 2 1 1 0.285714286 0
21 2018/19-P12 443709 5 0 5 0 0
22 2018/19-P12 443885 3 0 3 0 0
23 2018/19-P12 444040 15 9 6 0.857142857 0
24 2018/19-P12 445021 3 0 3 0 0
Edit 2: Added actual sample dataset - Image link still available