1

I'm using R to construct an ordinal logistic regression model for response variable "Total.Spent".

First up, I'm aware that the lowest level for "Total.Spent" should be "$0 - $499" to avoid ambiguity but this isn't material to my problem.

Because the response variable is ordinal, and the model calculates cumulative probability, I need to get the levels for the response variable in the correct order. The default order is $0-$500, $1,000-$2,499, $2,500-$4,999, $5,000+, $500-$999. There are missing values in the data set.

I've reordered one of the predictor variables "Month", (and the code returns the expected result),but when I attempt to re-order the response variable, the values in my "Total.Spent" column in my data frame are removed, giving me an empty column.

Here's the code I've used:

# Re-order Month so that months appear in calendar order, beginning in April
months_in_survey_year_order <- c("April", "May", "June", "July", "August", "September",
                                 "October", "November", "December", "January", 
                                 "February", "March")

KI_Data$Month <- factor(KI_Data$Month,
                        levels = months_in_survey_year_order)

# Re-order Total.Spent so that the levels appear in the correct order
spend_in_order <- c("$0 - $500", "$500 - $999", "$1,000 - $2,499", 
                    "$2,500 - $4,999", "$5,000+")

KI_Data$Total.Spent <- factor(KI_Data$Total.Spent,
                         levels = spend_in_order) # This line is deleting all of the Total.Spent values

I have no idea what's going on to cause the values in KI_Data$Total.Spent to disappear. I'm wondering whether it's because this is the model's response variable? Maybe it's because there's missing values in the data set? Or is there something else going on?

I'd appreciate any guidance.

Thanks, Kirsten

  • You'll need to provide a sample of the data, specifically the original `Total.Spent` variable. I suspect that the problem has to do with spaces before and after the dashes. – Edward Jan 05 '23 at 06:15
  • 1
    I might also be a situation where using the `labels` argument, instead of the `levels` argument is appropriate. This [post](https://stackoverflow.com/a/5869675/5456906) might help. – xilliam Jan 05 '23 at 09:06
  • Thanks @Edward and I appreciate your help. The data includes spaces before and after the dash in the lower bands, but no space between $5,000 and the + in the highest level. I'm not actually sure how to share the data here, first time poster. I'll give the labels argument a go, and see what happens. – Kirsten Bradley Jan 05 '23 at 19:48
  • @KirstenBradley. Use `dput(head(KI_Data$Total.Spent))` – Edward Jan 06 '23 at 02:28
  • @Edward I think I've found the issue thanks to your help. Here's the data: `dput(head(KI_Data$Total.Spent)) c(" $1,000 - $2,499", " $1,000 - $2,499", " $1,000 - $2,499", " $500 - $999", " $500 - $999", " $500 - $999")` I suspect those leading spaces are the issue, will fix and retry. I'm not sure how they got there, must have been in the original data and I missed them during the clean-up. Thanks again! – Kirsten Bradley Jan 07 '23 at 01:16
  • Just an update for anyone who encounters this issue in the future: The leading spaces appeared during the conversion from .xlsx to .csv format. I also tested converting from a Google Sheets spreadsheet to .csv, and got leading spaces in that case too. The Total.Spend column was the only column that picked up leading spaces. Once I cleaned up the leading spaces, the ordering works perfectly. – Kirsten Bradley Jan 07 '23 at 02:12

0 Answers0