1

I have this data,

df <- data.frame(V1= c("SF", "SF", "NYC"),
    V1_1 = c(1990, 2000, 1990),
    V1_10 = 1:3, V1_2 = 1:3, V2 = 1:3)

I want this to be in the order of

V1, V1_1, V1_2, V1_10, V2

I tried several different ways but they did not resulted as above.

halfer
  • 19,824
  • 17
  • 99
  • 186
Yun Tae Hwang
  • 1,249
  • 3
  • 18
  • 30
  • Tried several ways such as what? It can be helpful to see what *hasn't* worked, since it gives us a sense of your approach to the problem and the tools you're working with – camille Sep 19 '19 at 17:33
  • Several answers [here](https://stackoverflow.com/q/17531403/5325862) might be helpful – camille Sep 19 '19 at 17:40

2 Answers2

2

We can use mixedsort or mixedorder from gtools

gtools::mixedsort(names(df))
#[1] "V1"    "V1_1"  "V1_2"  "V1_10" "V2"   

df[gtools::mixedsort(names(df))]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

gtools::mixedsort appears to be the slickest solution, but below I have added some code that gets you to the same place without the use of a specialised library...the code could be reworked to other situations...

Define the following function and combine with sapply to apply the function to the column names. That will give you a matrix of numerical values to sort on: the first row corresponds to the value next to "V" in the column name, while the second row corresponds to the number after "_" in the column name.

> convert_Name_To_Numbers <- function(x){
+   tempVec <- strsplit(x, "_")[[1]]
+   resultVec <- c(gsub("V","", tempVec[1]), 
+                  ifelse(is.na(tempVec[2]),"0", tempVec[2]))
+   return(as.double(resultVec))
+ }

> sortVec <- sapply(names(df), convert_Name_To_Numbers)
> sortVec
     V1 V1_1 V1_10 V1_2 V2
[1,]  1    1     1    1  2
[2,]  0    1    10    2  0

Next use order to re-order the column names using these two rows, i.e. first sorting on the 1st row, then using the 2nd row values.

> newColumns <- names(df)[order(sortVec[1,], sortVec[2,])]
> newColumns
[1] "V1"    "V1_1"  "V1_2"  "V1_10" "V2"

It's hopefully clear that the code could be easily tweaked to tackle more complex column names where you want more than two sort criteria, e.g. V1, V1_1, V1_1_1, V_1_2, ...