This is the code
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
print(train_data.head())
print('\nShape of training data :',train_data.shape)
print('\nShape of testing data :',test_data.shape)
train_x = train_data.drop(columns=['pHSWS25'],axis=1)
train_y = train_data['pHSWS25']
print train_x.head()
print train_y.head()
LinearRegression().fit(train_x,train_y)
When I run it I get:
Section Longitude Latitude ... Alkalinity pHSWS25 TCO2
0 06GA19960613 64.87 81.38 ... 2236.3 7.79776 2056.6
1 06GA19960613 64.87 81.38 ... 2234.4 7.78997 2068.4
2 06GA19960613 64.87 81.38 ... 2247.1 7.74140 2104.1
3 06GA19960613 64.87 81.38 ... 2254.1 7.71428 2120.5
4 06GA19960613 64.87 81.38 ... 2270.4 7.69494 2131.7
[5 rows x 18 columns]
('\nShape of training data :', (87099, 18))
('\nShape of testing data :', (171921, 18))
////////////////////////
Section Longitude Latitude ... Phosphate Alkalinity TCO2
0 06GA19960613 64.87 81.38 ... 0.214634 2236.3 2056.6
1 06GA19960613 64.87 81.38 ... 0.253659 2234.4 2068.4
2 06GA19960613 64.87 81.38 ... 0.390244 2247.1 2104.1
3 06GA19960613 64.87 81.38 ... 0.536585 2254.1 2120.5
4 06GA19960613 64.87 81.38 ... 0.595122 2270.4 2131.7
[5 rows x 17 columns]
0 7.79776
1 7.78997
2 7.74140
3 7.71428
4 7.69494
The error:
Name: pHSWS25, dtype: float64
Traceback (most recent call last):
File "ocean_data.py", line 60, in <module>
LinearRegression().fit(train_x,train_y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 458, in fit
y_numeric=True, multi_output=True)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 756, in check_X_y
estimator=estimator)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 567, in check_array
array = array.astype(np.float64)
ValueError: invalid literal for float(): 06GA19960613
Could anyone help to solve this issue?