1

I've got a dataframe pulled from an external source with column labels based on year and week counts. Unfortunately, the columns pull in a weird non-sequential order (feature of the external dataset), and so when I report on them I'm going to want to pull the columns using "select" to get them in date-sequential order.

I want to insert a zero before the single-digit column labels below -- that is, "W1_2019" becomes "W01_2019" (and so forth for 2, 3, and up to 9), but not before the double-digit ones -- that is ""W10_2019" will remain as-is. The resulting column should allow me to order names(df) in ascending order, with W01 followed by W02 and W03. Without the zeros, of course, the order is W1 followed by W10 and then W2 which is exactly what I don't want.

See code below.

df<-setNames(
  data.frame(
  t(data.frame(c("1","2","1","3","2","3", "1")))
  ,row.names = NULL,stringsAsFactors = FALSE
  ), 
  c("W10_2018", "W50_2018", "W51_2018", "W52_2018", "W1_2019", "W2_2019", "W3_2019")
  )

names(df) = gsub(pattern="W#_.*", replacement = "W0#_", x=names(df))

sort(names(df))

The gsub line doesn't return an error, but it also doesn't change the names. The result is that the output of the "sort" line is:

[1] "W1_2019" "W10_2018" "W2_2019" "W3_2019" "W50_2018" "W51_2018" "W52_2018"

What it should look like if successful is:

[1] "W01_2019" "W02_2019" "W03_2019" "W10_2018" "W50_2018" "W51_2018" "W52_2018"

DanM
  • 337
  • 3
  • 9

1 Answers1

3

We can use mixedsort from gtools

library(gtools)
mixedsort(names(df))
#[1] "W1_2019"  "W2_2019"  "W3_2019"  "W10_2018" "W50_2018" "W51_2018" "W52_2018"

If we need to have consistency i.e. 2 digits after 'W', make some changes with sub. Grab the one digit that follows the 'W' and before the '_' as a capture group (2 digits will not be matched), then in the replacement, "W" followed by a digit 0, then backreference of captured group (\\1) and the _ and it should work

mixedsort(sub("W(\\d{1})_", "W0\\1_", names(df)))
#[1] "W01_2019" "W02_2019" "W03_2019" "W10_2018" "W50_2018" "W51_2018" "W52_2018"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • OK, you did exactly what I asked ... and thanks for "mixedsort" which I did not know about at all. Now I realize I got my question wrong though. If I want a date-order sort, I've got to also change my string format from "W50_2018" to "2018_W50" in order to get a proper order. In other words all of year 2018 needs to precede all of 2019. Is that also a regex command? – DanM Apr 12 '19 at 16:35
  • 1
    @DanM That would be `sub("(.*)_(\\d{4})$", "\\2_\\1", names(df))` – akrun Apr 12 '19 at 16:37
  • 1
    Nope @akrun, that did the trick! Thanks once again for your assistance. You're a gem! – DanM Apr 12 '19 at 16:44