Logistic Regression in python

Question

I am currently doing the Logistic Regression in machine learning for python. This is the code i write.

import pandas as pd
from sklearn import linear_model
import numpy as np
from sklearn.utils import column_or_1d

logistic = linear_model.LogisticRegression()

data = pd.read_excel('/home/mick/PycharmProjects/project1/excel/Ron95_Price_Class.xlsx')

X = data[['Date']]
y = data[['Ron95_RM']]

y = np.ravel(y)

logistic.fit(X, y)

price = logistic.predict(42491)
print "The price for Ron95 in next month will be RM", np.array_str(price,1)

This is the output of the code

The price for Ron95 in next month will be RM [ u'B']

There is no error, but my question is the characters after RM in the output should be 'B' or an other characters. I wonder if it's because I do the code wrongly or is just a format problem with the numpy array.

Because I basically just started with Python today, sorry if I just made a stupid mistake.

https://drive.google.com/open?id=0BzvrBlV2c5P-bGt4VG85emNnbXc This is the xlsx file. And for the 42491, is just a date value. I find out that the code i use cannot resolve the date format in xlsx — Mick, May 23 '16 at 13:35

score 0 · Answer 1 · answered May 23 '16 at 13:29

0

I think it will be more easily, when you post some data from Ron95_Price_Class.xlsx
Right now I see, that you are not delete target variable (y), from train data. You can do it by

X = data['Date']             #you can use only one bracket if choose only
y = data['Ron95_RM']         #column
X = data.drop('Ron95_RM')

answered May 23 '16 at 13:29

LinearLeopard

728
1
6
18

ValueError: labels ['Ron95_RM'] not contained in axis – Mick May 23 '16 at 13:33
Oops, sorry, my fault. Right, there is not 'Ron95_RM' in train data. I have to go sleep :) Could you post some lines from Ron95_Price_Class.xlsx, or print(X,y)? – LinearLeopard May 23 '16 at 13:53

score 0 · Accepted Answer · edited May 23 '17 at 10:28

If I am not mistaken the 'u' is just notation that the string is a unicode string. I am not sure how you are running your code, but when i test in an ipython notebook or in a windows command prompt I get the following output:

The price for Ron95 in next month will be RM [ 'B']

This is perhaps because I ran this in python 3.5 whereas it appears you are still using python < 3.0.

It's not that your answer is wrong, you are just getting info about the format of the data. For other questions on this subject see here and here. The python how-to on unicode may also be helpful.

score 0 · Answer 3 · answered May 23 '16 at 14:33

0

The Predict method as mentioned in the scikit-learn documentation, http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict, mentions the return of the predict method is array, shape = [n_samples]. So for you the shape is 1x1 array. To get the desired output you ca try "price[0]".

answered May 23 '16 at 14:33

pmaniyan

1,046
8
15

Cool, I would really appreciate if you could do a +vote or select my answer as the right one. Thanks. – pmaniyan May 24 '16 at 03:46

Logistic Regression in python

3 Answers3