0

I have the dataframe 'rankedvariableslist', with the index 'Sleepvariables' being the sleep variable of interest, and the two columns being the R-squared and P-value of that model and variable respectively.

I am trying to sort the data in ascending order by 'P-value', then by 'R-squared value', but I keep getting the error: ''values' is not ordered, please explicitly specify the categories order by passing in a categories argument' and am not sure why.

I would be so grateful for a helping hand!

correspondantsleepvariable = []
correspondantpvalue = []
correspondantpvalue = [] 

newerresults = resultmodeldistancevariation2sleepsummary.tables[0]
newerdata = pd.DataFrame(newerresults)
rsquaredvalue = newerdata.iloc[0,3]
rsquaredvalues.append(rsquaredvalue)
modelpvalues = resultmodeldistancevariation2sleepsummary.tables[1]
newerdatavalues = pd.DataFrame(modelpvalues)
pvalue = newerdatavalues.iloc[12,4]
correspondantpvalue.append(pvalue)
correspondantsleepvariable.append(sleepvariable[i])
rankedvariableslist.sort_values(['P-value','R-squared value'],ascending = [True, False])
print(rankedvariableslist.head(3)

                         Sleepvariables  R-squared value P-value
0                        hours_of_sleep           0.026   0.491
1              frequency_of_alarm_usage           0.026   0.681
2                        sleepiness_bed           0.026   0.413
As an example of the dataframe 'newerresults':

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               distance   R-squared:                       0.028
Model:                            OLS   Adj. R-squared:                  0.016
Method:                 Least Squares   F-statistic:                     2.338
Date:                Fri, 18 Nov 2022   Prob (F-statistic):            0.00773
Time:                        12:39:29   Log-Likelihood:                -1274.1
No. Observations:                 907   AIC:                             2572.
Df Residuals:                     895   BIC:                             2630.
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
Caledonian26
  • 727
  • 1
  • 10
  • 27
  • a little experiment. Use this before sorting: `df['R-squared value'] = df['R-squared value'].astype('category').cat.as_ordered()` then `df['P-value'] = df['P-value'].astype('category').cat.as_ordered()` – Bushmaster Nov 18 '22 at 12:22
  • sadly, that does not order the categories either... when I print df['R-squared value] - I get: Sleepvariables hours_of_sleep 0.026 frequency_of_alarm_usage 0.026 sleepiness_bed 0.026 sleepiness_waking 0.025 sleep_quality 0.030 nap_duration_mins 0.026. Thus it doesn't seem to be ordering the categories as it should... :( – Caledonian26 Nov 18 '22 at 13:45
  • When I enter just the line: "rankedvariableslist[['Value']].sort_values(by=['P-value','R-squared value'],ascending = [True,False],inplace=True)", I get an error message: KeyError: 'P-value'. Why might this be? – Caledonian26 Nov 18 '22 at 13:47
  • Are you sure all values ​​are float or int in R-squared value and P-value columns ? – Bushmaster Nov 18 '22 at 14:08
  • try: `df['Sleepvariables'] = df['Sleepvariables'].astype('category').cat.as_ordered()` – Bushmaster Nov 18 '22 at 14:11
  • when I tried to convert to float, I got this error: float() argument must be a string or a number, not 'Cell' – Caledonian26 Nov 18 '22 at 14:12
  • when I add in the line you suggested above, it doesn't give me any errors but is still not ordered :( not sure why – Caledonian26 Nov 18 '22 at 14:15
  • Can you write the code you are trying to convert to float? – Bushmaster Nov 18 '22 at 14:17
  • rankedvariableslist['R-squared value'] = rankedvariableslist['R-squared value'].astype(float) :) – Caledonian26 Nov 18 '22 at 14:18
  • rankedvariableslist['P-value'] = rankedvariableslist['P-value'].astype(float) – Caledonian26 Nov 18 '22 at 14:18
  • I think this is an error from the dataset. Can you take a look at this [question](https://stackoverflow.com/questions/37286844/typeerror-float-argument-must-be-a-string-or-a-number-when-reading-a-list): – Bushmaster Nov 18 '22 at 14:22
  • 0 0.026 1 0.026 2 0.026 3 0.025 4 0.030 5 0.026 6 0.026 7 0.026 8 0.026 9 0.034 10 0.026 11 0.027 12 0.026 13 0.026 14 0.025 15 0.026 16 0.025 17 0.026 18 0.026 19 0.026 20 0.028 – Caledonian26 Nov 18 '22 at 14:26
  • I have tried printing rankedvariableslist['R-squared value'] and I'm not sure how to convert these values to float? – Caledonian26 Nov 18 '22 at 14:27
  • I have amended the code above to show where the values came from earlier on - could there be something wrong with the way I have stored the output of previous variables etc? – Caledonian26 Nov 18 '22 at 14:30
  • https://stackoverflow.com/questions/74491328/float-argument-must-be-a-string-or-a-number-not-cell-cannot-solve-issue - please see my new thread! – Caledonian26 Nov 18 '22 at 14:40
  • don't need this. I did the same operations with another dataset. As a result, you are making the first mistake when converting the model to a dataframe. – Bushmaster Nov 18 '22 at 14:44
  • https://stackoverflow.com/questions/51734180/converting-statsmodels-summary-object-to-pandas-dataframe – Bushmaster Nov 18 '22 at 14:45
  • 1
    I have now found a solution! Thank you so much for your help :) – Caledonian26 Nov 18 '22 at 17:26

1 Answers1

0

The following code worked - instead of converting the model summary output to a dataframe, I converted the model summary output to a html file).

correspondantsleepvariable = []
correspondantpvalue = []
correspondantpvalue = [] 

results_as_html = resultmodeldistancevariation2sleepsummary.tables[0].as_html()
datehere = pd.read_html(results_as_html, header=None, index_col=None)[0]
rsquaredvalue = datehere.iloc[0,3]
rsquaredvalue.astype(float)
rsquaredvalues.append(rsquaredvalue)
results_as_html = resultmodeldistancevariation2sleepsummary.tables[1].as_html()
datehere = pd.read_html(results_as_html, header=0, index_col=0)[0]
pvalue = datehere.iloc[11,3]
pvalue.astype(float)
correspondantpvalue.append(pvalue)
correspondantsleepvariable.append(sleepvariable[i])
rankedvariableslist = 
pd.DataFrame({'Sleepvariables':correspondantsleepvariable, 'R-squared value':rsquaredvalues,'P-value':correspondantpvalue})
rankedvariableslist.sort_values(by=['P-value','R-squared value'],ascending = [True,False],inplace=True)
print(rankedvariableslist)

Sleepvariables  R-squared value  P-value
9    time_spent_awake_during_night_mins            0.034    0.005
4                         sleep_quality            0.030    0.041
20          sleepiness_resolution_index            0.028    0.129

Thanks so much for all your help - I am so grateful!

Caledonian26
  • 727
  • 1
  • 10
  • 27