Performing logistic regression analysis in python using sklearn

Question

I am trying to perform a logistic regression analysis but I don't know which part am i mistaken in my code. It gives error on the line logistic_regression.fit(X_train, y_train). But it seems okay as i checked from different sources. Can anybody help? Here is my code:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0, sep=";")
data = pd.DataFrame(df)

x = data.drop(columns=["churn"])  #features
y = data.churn  # target variable
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
logistic_regression = LogisticRegression()
logistic_regression.fit(X_train, y_train)

state;account_length;area_code;international_plan;voice_mail_plan;number_vmail_messages;total_day_minutes;total_day_calls;total_day_charge;total_eve_minutes;total_eve_calls;total_eve_charge;total_night_minutes;total_night_calls;total_night_charge;total_intl_minutes;total_intl_calls;total_intl_charge;number_customer_service_calls;churn; 1;KS;128;area_code_415;no;yes;25;265.1;110;45.07;197.4;99;16.78;244.7;91;11.01;10;3;2.7;1;no 2;OH;107;area_code_415;no;yes;26;161.6;123;27.47;195.5;103;16.62;254.4;103;11.45;13.7;3;3.7;1;no @dm2 — Utku Şenel, Jul 27 '20 at 23:05
Please edit your question and include sample data. See how it's done here https://stackoverflow.com/q/20109391/6692898 — RichieV, Jul 27 '20 at 23:18
what is the exception being raised? Can you include that too? — RichieV, Jul 27 '20 at 23:20
Please notice that any code that comes *after* the error is irrelevant to the issue (since never executed) and should not be included here (it just creates unnecessary clutter). Same holds for irrelevant imports (edited out - look how much cleaner your code looks now). — desertnaut, Jul 28 '20 at 09:36
Sorry guys for the lack of format knowledge. I am new to this platform. @RichieV — Utku Şenel, Jul 28 '20 at 19:17
@UtkuŞenel If you think the answer below answers your question, please make sure to mark it answered by clicking on the check mark below upvote and downvote buttons. — Farid Jafri, Jul 29 '20 at 12:45

score 2 · Accepted Answer · answered Jul 28 '20 at 01:14

There are multiple problems here.

Your first row of headers has a ';' at the end. So it is going to read an extra column. You need to remove that ';' after churn.
The training data that you are trying to use here, X_train, is going to have multiple text/categorical columns. You need to convert these into numbers. Check out OneHotEncoder here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html and LabelEncoder here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

After you have converted your text and categorical data to numbers and removed the extra ';' separator, run your algorithm again.

Performing logistic regression analysis in python using sklearn

1 Answers1