-1

I have some files that I am listing using:

dir <- list.files("/data/2014", "*.img$", full.names = TRUE)

example of the file listed in dir:

"/data/2014/file300.data.20141231.MC.9.vgf.img"

so all files have the same name but change with date 20141231 and hour 9

R lists the files according to the date and that is fine but it misses up the hour lie this:

    10 1 11 12.....20 2 21 22....24 3 4....   

which should be 0 1 2 3 4 5 6 .....10 11 ..... 20 21 ....24

I tried mixedsort from gtools with no success.

xx <- c('file300.data.20141231.MC.10.vgf.img', 'file300.data.20141231.MC.24.vgf.img',
'file300.data.20141231.MC.9.vgf.img', 'file300.data.20141231.MC.1.vgf.img',
'file300.data.20141231.MC.2.vgf.img')

xx 
# [1] "file300.data.20141231.MC.10.vgf.img"                       
# [2] "file300.data.20141231.MC.24.vgf.img" 
# [3] "file300.data.20141231.MC.9.vgf.img" 
# [4] "file300.data.20141231.MC.1.vgf.img"
# [5] "file300.data.20141231.MC.2.vgf.img" 

now test mixedsort()

dir1 <- mixedsort(xx)
dir1 
# [1] "file300.data.20141231.MC.10.vgf.img" 
# [2] "file300.data.20141231.MC.1.vgf.img" 
# [3] "file300.data.20141231.MC.2.vgf.img" 
# [4] "file300.data.20141231.MC.24.vgf.img" 
# [5] "file300.data.20141231.MC.9.vgf.img"

What I want is like this:

# [1] "file300.data.20141231.MC.1.vgf.img" 
# [2] "file300.data.20141231.MC.2.vgf.img" 
# [3] "file300.data.20141231.MC.9.vgf.img" 
# [4] "file300.data.20141231.MC.10.vgf.img" 
# [5] "file300.data.20141231.MC.24.vgf.img"
Barry
  • 739
  • 1
  • 8
  • 29
  • How exactly did you try `mixedsort`? Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of a vector we can test with? – MrFlick Jul 20 '15 at 16:49
  • you can test with this: `xx <- 'file300.data.20141231.MC.9.vgf.img'`. `dir<- mixedsort(dir)` ` – Barry Jul 20 '15 at 16:54
  • 1
    That's a single value. How would you know if it sorted properly?? – MrFlick Jul 20 '15 at 16:55
  • This is expected behavior. R is sorting the character string, and in character sorting, 10 comes before 1. comes before 11..... You may want to consider putting the filenames in a data frame and extracting different components of the string into new fields, then sort the data frame on the new fields. – Benjamin Jul 20 '15 at 16:58
  • @Barry Please edit the main question rather than putting your example in the comments. Also, you don't want all the `xx<-` *inside* the vector. That doesn't make a lot of sense. When you run mixed sort on that vector, it orders the numbers "correctly" in my opinion. What is the desired order for this sample given your definition? – MrFlick Jul 20 '15 at 17:10
  • @Barry That's not the order I get when I run your sample code. I get your desired output. Tested with `gtools_3.4.2` – MrFlick Jul 20 '15 at 17:19

2 Answers2

1

So it looks like mixedsort will only sort on the first set of numbers found in the string. In your example, it's sorting all of the 300's first, and then does character sorting on the date, and then on the hour. I changed your example data below to use a file310 and file301 so that you can see what's happening.

(Edited Example)

xx <- c('file300.data.20141231.MC.10.vgf.img',
        'file300.data.20141231.MC.24.vgf.img',
        'file300.data.20141231.MC.9.vgf.img',
        'file300.data.20141231.MC.1.vgf.img',
        'file300.data.20141231.MC.2.vgf.img')

gtools::mixedsort(xx)

library(dplyr)
library(stringr)
data_frame(xx = xx) %>%
  bind_cols(., 
            as.data.frame(str_split_fixed(xx, "[.]", 7),
                          stringsAsFactors=FALSE)) %>%
  mutate(V5 = as.numeric(V5)) %>%
  arrange(V1, V3, V5)
Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • still gave the same order! – Barry Jul 20 '15 at 17:26
  • You're right. If forgot to change V5 to a numerical value. My apologies. See the edited example (I also changed the xx value to get rid of file301 and file 310). – Benjamin Jul 20 '15 at 17:30
  • 1
    If you save that data frame (add `sorted_data <- ` in front of `data_frame`) the filenames will still be in the first column of the sorted data. you can pull them out with `sorted_data$xx`. – Benjamin Jul 20 '15 at 17:39
  • Yes but that is only the name of file without the path so will be just a name. when I type `dir`,in my example, it gave:/data/2014/filename. How to come back to the path after the new order? – Barry Jul 20 '15 at 17:43
  • It only has the file name because that was the only piece I fed into the data frame. It will still work if you pass the full file name into the data frame. Or if you want to start with the full file name and work on just the file name without the directory, you can make a new column using `basename` – Benjamin Jul 21 '15 at 10:45
1
xx <- c('file300.data.20141231.MC.10.vgf.img',
    'file300.data.20141231.MC.24.vgf.img',
    'file300.data.20141231.MC.9.vgf.img',
    'file300.data.20141231.MC.1.vgf.img',
    'file300.data.20141231.MC.2.vgf.img')
xxx <- unlist(strsplit(substr(xx, 26, 50), split=".v"))
yyy <- as.numeric(xxx[rep(c(T, F), length.out=length(xxx))])
xx[order(yyy)]

50 is the location of the last character of your string. of course this is an over estimate in this example!

Shahab Einabadi
  • 307
  • 4
  • 15