10

I am looking into the time complexities of Machine Learning Algorithms and I cannot find what is the time complexity of Logistic Regression for predicting a new input. I have read that for Classification is O(c*d) c-beeing the number of classes, d-beeing the number of dimensions and I know that for the Linear Regression the search/prediction time complexity is O(d). Could you maybe explain what is the search/predict time complexity of Logistic Regression? Thank you in advance

Example For The other Machine Learning Problems: https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/

Ana Smile
  • 131
  • 1
  • 9
  • sorry it's my first question – Ana Smile Jan 17 '19 at 15:00
  • Complexity is not defined for a *problem* (i.e. logistic regression), but for specific *algorithms* that solve a problem; see https://www.quora.com/What-is-the-computational-complexity-of-a-classification-problem-Is-it-P-or-NP – desertnaut Jan 17 '19 at 15:02
  • @desertnaut this complexity is for training and not prediction – Ana Smile Jan 17 '19 at 15:09
  • Fair point; so, for `k` features it is the complexity of multiplying `k` pairs of numbers (i.e. the model coefficients X the features) and then adding them, plus one more addition operation for the bias term (per sample), right? (Still, it is unclear what you mean by "search"...) – desertnaut Jan 17 '19 at 15:18
  • I have added an article which maybe will clarify what I need, so in this article they show the Training Time complexity and the prediction/search time complexity meaning what is the complexity of predicting y for new point x. – Ana Smile Jan 17 '19 at 15:26

1 Answers1

14

Complexity of training for logistic regression methods with gradient based optimization: O((f+1)csE), where:

  • f - number of features (+1 because of bias). Multiplication of each feature times it's weight (f operations, +1 for bias). Another f + 1 operations for summing all of them (obtaining prediction). Using gradient method to improve weights counts for the same number of operations, so in total we get 4* (f+1) (two for forward pass, two for backward), which is simply O(f+1).
  • c - number of classes (possible outputs) in your logistic regression. For binary classification it's one, so this term cancels out. Each class has it's corresponding set of weights.
  • s - number of samples in your dataset, this one is quite intuitive I think.
  • E - number of epochs you are willing to run the gradient descent (whole passes through dataset)

Note: this complexity can change based on things like regularization (another c operations), but the idea standing behind it goes like this.

Complexity of predictions for one sample: O((f+1)c)

  • f + 1 - you simply multiply each weight by the value of feature, add bias and sum all of it together in the end.
  • c - you do it for every class, 1 for binary predictions.

Complexity of predictions for many samples: O((f+1)cs)

  • (f+1)c - see complexity for one sample
  • s - number of samples

Difference between logistic and linear regression in terms of complexity: activation function.

For multiclass logistic regression it will be softmax, while linear regression, as the name suggests, has linear activation (effectively no activation). It does not change the complexity using big O notation, but it's another c*f operations during the training (didn't want to clutter the picture further) multiplied by 2 for backprop.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83