0

I have a column in dataframe like this...

retention_completion_variable_name <- data.frame(
  retention_completion_variable_name = c(
    "Completed Degree in 1 Year",
    "Retained to Midyear Year 1",
    "Completed Degree in 2 Years",
    "Retained to Midyear Year 2",
    "Retained to Start of Year 2"
  ),
  retention_completion_value = c(0, 0, 0, 1, 1),
  stringsAsFactors = FALSE
)   

I want to sort this column like

       Retained to Midyear Year 1                     0              
       Retained to Start of Year 2                    1        
       Retained to Midyear Year 2                     1             
       Completed Degree in 1 Year                     0            
       Completed Degree in 2 Years                    0              
Sotos
  • 51,121
  • 6
  • 32
  • 66
rahul rajput
  • 35
  • 1
  • 5
  • 2
    Possible duplicate of [How to sort a dataframe by column(s)?](https://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns) – Ale Jul 25 '17 at 12:18
  • 1
    This is not clear. How are you getting that order? I can understand the ordering by values (1 & 2 in your case) by group (Retained Vs Completed) but why all retained first and then all completed? – Sotos Jul 25 '17 at 12:27
  • @Ale The dupe target is pretty close but the answers there focus on ordering a column which is already given by the OP as an _ordered factor_. The Q here is more concerned IMHO with how to specify a particular order. – Uwe Jul 25 '17 at 13:40

1 Answers1

3

This is one of few cases where I feel factor() is really useful:

lvls <- c("Retained to Midyear Year 1", "Retained to Start of Year 2", 
          "Retained to Midyear Year 2", "Completed Degree in 1 Year", 
          "Completed Degree in 2 Years")
DT$retention_completion_variable_name <- 
  factor(DT$retention_completion_variable_name, levels = lvls)
DT <- DT[order(DT$retention_completion_variable_name), ]
DT
  retention_completion_variable_name retention_completion_value
2         Retained to Midyear Year 1                          0
5        Retained to Start of Year 2                          1
4         Retained to Midyear Year 2                          1
1         Completed Degree in 1 Year                          0
3        Completed Degree in 2 Years                          0

Data

DT <- as.data.frame(readr::read_table(
  "retention_completion_variable_name      retention_completion_value     
   Completed Degree in 1 Year                         0            
   Retained to Midyear Year 1                         0              
   Completed Degree in 2 Years                        0              
   Retained to Midyear Year 2                         1             
   Retained to Start of Year 2                        1    "
))

Enhancement

In case there are many years to cover, the creation of the factor levels by hand would be quite cumbersome and error-prone. However, this can be automated as well by observing three rules

  1. All "Retained" come before all "Completed".
  2. Within Retained it's ordered by year and within the year by "Start" and "Midyear".
  3. Within "Completed" it's ordered by year.

These rules can be used to create the factor levels programmatically:

n_years <- 5L
lvls <- c(paste(c("Retained to Start of Year", "Retained to Midyear Year"), 
                rep(seq_len(n_years), each = 2L)),
          sprintf("Completed Degree in %i Years", seq_len(n_years)))
lvls
 [1] "Retained to Start of Year 1" "Retained to Midyear Year 1"  "Retained to Start of Year 2"
 [4] "Retained to Midyear Year 2"  "Retained to Start of Year 3" "Retained to Midyear Year 3" 
 [7] "Retained to Start of Year 4" "Retained to Midyear Year 4"  "Retained to Start of Year 5"
[10] "Retained to Midyear Year 5"  "Completed Degree in 1 Years" "Completed Degree in 2 Years"
[13] "Completed Degree in 3 Years" "Completed Degree in 4 Years" "Completed Degree in 5 Years"
Uwe
  • 41,420
  • 11
  • 90
  • 134