15

As I understand it when creating a supervised learning model, our model may have high bias if we are making very simple assumptions (for example if our function is linear) which cause the algorithm to miss relationships between our features and target output resulting in errors. This is underfitting.

On the other hand if we make our algorithm too strong (many polynomial features), it'll be very sensitive to small fluctuations in our training set causing ovefitting: modeling the random noise in the training data, rather than the intended outputs. This is overfitting.

image showing underfitting and overfitting

This makes sense to me, but I heard that a model can have both high variance and high bias and I just don't understand how that would possible. If high bias and high variance are synonyms for underfitting and overfitting, then how can you have both overfitting and underfitting on the same model? Is it possible? How can it happen? What does it look like when it does happen?

Alaa Awad
  • 3,612
  • 6
  • 25
  • 35
  • 4
    Perhaps better on http://stats.stackexchange.com – Paul Aug 22 '15 at 22:00
  • [Bias–variance_tradeoff](https://en.wikipedia.org/wiki/Bias–variance_tradeoff) May be useful for you – Ibraim Ganiev Aug 23 '15 at 05:41
  • [another good article](https://theclevermachine.wordpress.com/2013/04/21/model-selection-underfitting-overfitting-and-the-bias-variance-tradeoff/) – Ibraim Ganiev Aug 23 '15 at 05:51
  • 1
    http://stats.stackexchange.com/questions/4284/intuitive-explanation-of-the-bias-variance-tradeoff – saurabh agarwal Aug 24 '15 at 11:02
  • Can you specify where have you heard about it? Both underfitting and overfitting are essentially characteristics of your model with respect to your training set. Thus a model can be underfitted and overfitted at the same time only for different training sets according to my understanding. – Aditya Patel Aug 24 '15 at 13:06
  • I have found this thing in Andrew Ng's machine learning course in Coursea. Please watch the [video](https://www.coursera.org/learn/machine-learning/lecture/Kont7/learning-curves) starting from 00:15. You will find it within 00:30. – Md. Abu Nafee Ibna Zahid Feb 02 '18 at 14:59
  • @Md.AbuNafeeIbnaZahid. There is a reason you retrain your models, since your underlying characteristics of the data changes overtime. Thus a model which has high variance can become one with high bias if the dataset changes. – Aditya Patel Mar 14 '18 at 08:09

2 Answers2

16

Imagine a regression problem. I define a classifier which outputs the maximum of the target variable observed in the training data, for all possible inputs.

This model is both biased (can only represent a singe output no matter how rich or varied the input) and has high variance (the max of a dataset will exhibit a lot of variability between datasets).

You're right to a certain extent that bias means a model is likely to underfit and variance means it's susceptible to overfitting, but they're not quite the same.

Ben Allison
  • 7,244
  • 1
  • 15
  • 24
0

According to me High bias and High variance will happen when the line fits to outliers in the data enter image description here