Questions tagged [logistic-regression]

Logistic regression is a statistical classification model used for making categorical predictions.

Logistic regression is a statistical analysis method used for predicting and understanding categorical dependent variables (e.g., true/false, or multinomial outcomes) based on one or more independent variables (e.g., predictors, features, or attributes). The probabilities describing the possible outcomes of a single trial are modeled as a function of the predictors using a logistic function (as it follows):

enter image description here

A logistic regression model can be represented by:

enter image description here

The logistic regression model has the nice property that the exponentiated regression coefficients can be interpreted as odds ratios associated with a one unit increase in the predictor.

Multinomial logistic regression (i.e., with three or more possible outcomes) are also sometimes called Maximum Entropy (MaxEnt) classifiers in the machine learning literature.


Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3746 questions
303
votes
26 answers

How to implement the Softmax function in Python

From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector: Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the…
alvas
  • 115,346
  • 109
  • 446
  • 738
148
votes
2 answers

Logistic regression python solvers' definitions

I am using the logistic regression function from sklearn, and was wondering what each of the solver is actually doing behind the scenes to solve the optimization problem. Can someone briefly describe what "newton-cg", "sag", "lbfgs" and "liblinear"…
Clement
  • 1,630
  • 3
  • 12
  • 10
115
votes
4 answers

ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT

I have a dataset consisting of both numeric and categorical data and I want to predict adverse outcomes for patients based on their medical characteristics. I defined a prediction pipeline for my dataset like so: X =…
sums22
  • 1,793
  • 3
  • 13
  • 25
101
votes
3 answers

How to choose cross-entropy loss in TensorFlow?

Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropy loss. Normally, the cross-entropy layer follows the softmax layer, which produces probability distribution. In tensorflow, there are at…
Maxim
  • 52,561
  • 27
  • 155
  • 209
90
votes
4 answers

ValueError: Unknown label type: 'unknown'

I try to run following code. import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression # data import and preparation trainData = pd.read_csv('train.csv') train = trainData.values testData =…
Ivan Zhovannik
  • 1,073
  • 1
  • 8
  • 8
84
votes
5 answers

Roc curve and cut off point. Python

I ran a logistic regression model and made predictions of the logit values. I used this to get the points on the ROC curve: from sklearn import metrics fpr, tpr, thresholds = metrics.roc_curve(Y_test,p) I know metrics.roc_auc_score gives the area…
Shiva Prakash
  • 1,849
  • 4
  • 21
  • 25
78
votes
5 answers

sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."

I attempt to solve this problem 6 in this notebook. The question is to train a simple model on this data using 50, 100, 1000 and 5000 training samples by using the LogisticRegression model from sklearn.linear_model. lr =…
edwin
  • 1,152
  • 1
  • 13
  • 27
75
votes
2 answers

How to find the importance of the features for a logistic regression model?

I have a binary prediction model trained by logistic regression algorithm. I want know which features (predictors) are more important for the decision of positive or negative class. I know there is coef_ parameter which comes from the scikit-learn…
mgokhanbakal
  • 1,679
  • 1
  • 20
  • 26
59
votes
2 answers

What is the inverse of regularization strength in Logistic Regression? How should it affect my code?

I am using sklearn.linear_model.LogisticRegression in scikit learn to run a Logistic Regression. C : float, optional (default=1.0) Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values…
41
votes
2 answers

scikit-learn return value of LogisticRegression.predict_proba

What exactly does the LogisticRegression.predict_proba function return? In my example I get a result like this: array([ [4.65761066e-03, 9.95342389e-01], [9.75851270e-01, 2.41487300e-02], [9.99983374e-01, 1.66258341e-05] ]) From other…
41
votes
5 answers

Controlling the threshold in Logistic Regression in Scikit Learn

I am using the LogisticRegression() method in scikit-learn on a highly unbalanced data set. I have even turned the class_weight feature to auto. I know that in Logistic Regression it should be possible to know what is the threshold value for a…
40
votes
3 answers

AttributeError: 'str' object has no attribute 'decode' in fitting Logistic Regression Model

I am currently trying to create a binary classification using Logistic regression. Currently I am in determining the feature importance. I already did the data preprocessing (One Hot Encoding and sampling) and ran it with XGBoost and…
user2552108
  • 1,107
  • 3
  • 15
  • 30
36
votes
4 answers

R: Calculate and interpret odds ratio in logistic regression

I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively). My predictor variable is Thoughts and is continuous, can be positive or…
Sudy Majd
  • 365
  • 1
  • 4
  • 7
35
votes
2 answers

Scikit Learn: Logistic Regression model coefficients: Clarification

I need to know how to return the logistic regression coefficients in such a manner that I can generate the predicted probabilities myself. My code looks like this: lr = LogisticRegression() lr.fit(training_data, binary_labels) # Generate…
zbinsd
  • 4,084
  • 6
  • 33
  • 40
32
votes
5 answers

Why the cost function of logistic regression has a logarithmic expression?

cost function for the logistic regression is cost(h(theta)X,Y) = -log(h(theta)X) or -log(1-h(theta)X) My question is what is the base of putting the logarithmic expression for cost function .Where does it come from? i believe you can't just put…
Nipun Alahakoon
  • 2,772
  • 5
  • 27
  • 45
1
2 3
99 100