16

I have a dataset for which I have to develop various models and compute the adjusted R2 value of all models.

    cv = KFold(n_splits=5,shuffle=True,random_state=45)
    r2 = make_scorer(r2_score)
    r2_val_score = cross_val_score(clf, x, y, cv=cv,scoring=r2)
    scores=[r2_val_score.mean()]
    return scores

I have used the above code to calculate the R2 value of every model. But I am more interested to know the adjusted R2 value of every models Is there any package in python which can do the job?

I will appreciate your help.

Ahamed Moosa
  • 1,395
  • 7
  • 16
  • 30
  • Possible duplicate of [How to get Adjusted R Square for Linear Regression](https://stackoverflow.com/questions/51023806/how-to-get-adjusted-r-square-for-linear-regression) – gosuto Oct 25 '19 at 18:45
  • Possible duplicate https://stackoverflow.com/questions/49381661/how-do-i-calculate-the-adjusted-r-squared-score-using-scikit-learn/49381947 – gosuto Oct 25 '19 at 19:00

2 Answers2

32

you can calculate the adjusted R2 from R2 with a simple formula given here.

Adj r2 = 1-(1-R2)*(n-1)/(n-p-1)

Where n is the sample size and p is the number of independent variables.

Adjusted R2 requires number of independent variables as well. That's why it will not be calculated using this function.

There
  • 498
  • 6
  • 18
min2bro
  • 4,509
  • 5
  • 29
  • 55
  • 5
    Thanks , so I assume n = number of sample size , p = number of independent variables – Ahamed Moosa Jun 26 '18 at 09:07
  • 3
    When we want to calculate adjusted R2 for each fold during cross-validation, will `n` correspond to the size of the dataset or the size of the fold? (e.g., 80% of the number of rows if we are doing 5-fold CV) @min2bro – nvergos Apr 25 '19 at 14:41
  • 2
    @nvergos n should correspond to the size of the fold. – jeffhale Aug 05 '20 at 17:34
  • Should I use `n`and `p` of train set if I am evaluating for train or test set. Or I should use `n`and `p` for train set if I am evaluating for train set and use test set `n`and `p` if I am evaluating for test set? – vasili111 Feb 24 '21 at 23:05
  • 1
    @vasili111 we check the model performance on test data, so its better to check the adjusted r2 and r2 on test data. – Girish Kumar Chandora Jun 27 '21 at 16:33
2

Looks like the wikipedia has been revised over the course of time in regards to Adjusted R2 formula. To match the current state of the wikipedia link here this would be the appropriate formula:

Adj r2 = 1-(1-R2)*(n-1)/(n-p) **notice last part is (n-p) instead of (n-p-1)

where:

n = count of rows in your dataset used for train or test
p = count of independent variables

sch-man
  • 41
  • 2