4

This is about ordering column names that contain both numbers and text. I have a dataframe which resulted from dcastand has 200 rows. I have a problem with the ordering.

The column names are in the following format:

names(DF) <- c('Testname1.1', 'Testname1.100','Testname1.11','Testname1.2',...,Testname2.99)

Edit: I would like to have the columns ordered as:

names(DF) <- c('Testname1.1, Testname1.2,Testname1.3,...Testname1.100,Testname2.1,...Testname 2.100)

The original input has a column which specifies the day, but it is not being used when I 'cast' the data. Is there a way to specify the 'dcast' function to order combined column names numerically?

What would be the easiest way to get the columns ordered as I need to in R?

Thanks a lot!

col. slade
  • 451
  • 4
  • 13

3 Answers3

3

I think you need to split the column before you can use it to order the data frame:

library("reshape2")  ## for colsplit()
library("gtools")

Construct test data:

dat <- data.frame(matrix(1:25,5))
names(dat) <- c('Testname1.1', 'Testname1.100',
     'Testname1.11','Testname1.2','Testname2.99')

Split and order:

cdat <- colsplit(names(dat),"\\.",c("name","num"))
dat[,order(mixedorder(cdat$name),cdat$num)]

##   Testname1.1 Testname1.2 Testname1.11 Testname1.100 Testname2.99
## 1           1          16           11             6           21
## 2           2          17           12             7           22
## 3           3          18           13             8           23
## 4           4          19           14             9           24
## 5           5          20           15            10           25

The mixedorder() above (borrowed from @BondedDust's answer) is not really necessary for this example, but would be needed if the first (Testnamexx) component had more than 9 elements, so that Testname1, Testname2, and Testname10 would come in the proper order.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
3

The mixedorder and mixedsort functions of pkg:gtools sometimes does what is desired but in this case I think the period separator is messing things up because it is part of numeric values. But clearly was intended go be a separator rather than decimal point. Try

    nvec <- c('Testname1.1', 'Testname1.100', 'Testname1.11', 'Testname1.2', 'Testname2.99')
#------------
> require(gtools)
Loading required package: gtools

Attaching package: ‘gtools’

The following objects are masked from ‘package:boot’:

    inv.logit, logit
#------------
myvec <- nvec[order( mixedorder( sapply(strsplit(nvec, "\\."), "[[", 1)),
                  as.numeric(sapply(strsplit(nvec, "\\."), "[[", 2))  )
              ]
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

One way would be:

library(gtools) #use gtools library
library(NCmisc) #use NCmisc library for pad.left()

myvec <- c('Testname1.1', 'Testname1.100','Testname1.11','Testname1.2','Testname2.99') #construct your vector

myvec[mixedorder(  paste(substring(myvec,1,9), pad.left(substring(myvec,11,100),'0') , sep='')  ) ] 

[1] "Testname1.1"   "Testname1.2"   "Testname1.11"  "Testname1.100" "Testname2.99"
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • I upvoted simply for the mention of `gtools::mixedsort`, but if it were working as desired, it would not need to use substr. – IRTFM Dec 10 '14 at 23:52
  • hmm yeah you are right. I 'll have a look and fix asap. Thanks. mixedorder has helped me a lot too. – LyzandeR Dec 10 '14 at 23:55
  • With a bit of a clearer mind this morning, this also looks like a solution. Thanks for spotting my mistake previously. – LyzandeR Dec 11 '14 at 09:46