0

My csv files come in the format 0.csv, 1.csv up to 50.csv. Now I want to read them into R in numerical order but R continues to import it with 0.csv, 1.csv, 10.csv, 11.csv etc. What is the correct regex sucht that it reads the csv files ordered numerically?.Here is a minimal example:

 ldf <- list() 
  list_Candidates<- dir(pattern = "[[:digit:]].csv")  

creates the list of all the csv files in the Directory

  for (k in 1:length(list_Candidates)){
    ldf[[k]] <- fread(list_Candidates[k], sep=",")

loop over the length of the list and read each csv into the list

    ldf_Header = colnames(ldf[[k]])
    ldf[[k]] = ldf[[k]][,1:(length(ldf[[k]])-1)] # deletes the last column 
    colnames(ldf[[k]]) = ldf_Header[-1]

This part deletes the last column (this is to correct a bug within the fread function)

    ldf[[k]] <- ldf[[k]] %>% mutate(Candidate=k-1) # creates an additional column and assigns Candidate number
  }

The final part creates a new column in each dataframe which corresponds to the number of the csv file. Therefore it is curical that they are imported in the correct order in the first place. Thank you! :)

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
R_hub
  • 21
  • 4
  • 1
    See package `stringr`, functions `str_sort` and `str_order`, argument `numeric = TRUE` for a numeric order even when there's a mix of digits and alphabetic characters. – Rui Barradas Sep 22 '20 at 13:47
  • you can try `list_Candidates<- dir(pattern = "[[:digit:]]\\.csv") [order(as.numeric(gsub(".*\\D*(\\d+)\\.csv", "\\1", dir(pattern = "[[:digit:]]\\.csv"))))]` – Cath Sep 22 '20 at 13:48
  • 3
    Just an FYI - this has nothing to do with regex. Regex is about pattern matching, not sorting. (Though you could implement a solution with regex, it would be much safer to use a robust, tested solution like those Rui recommends.) – Gregor Thomas Sep 22 '20 at 13:48
  • @Cath Thank you, but this does not work unfortunately. It Imports the csv files as 0.csv, 10.csv, 20.csv etc. – R_hub Sep 22 '20 at 13:59
  • @RuiBarradas Thank you very much, I will check that! – R_hub Sep 22 '20 at 14:00
  • *"Order them"* is not the same as *"rename them"*. – r2evans Sep 22 '20 at 14:00
  • @GregorThomas Thank you, but i thought the pattern matching of [[:Digit:]] could be part of the Problem that R does not sort it the "correct" way :) – R_hub Sep 22 '20 at 14:03
  • @R_hub I guessed the pattern of your file names but it seems I guessed wrong and the regex should be slightly modify so it can work. But anyway, it's better to go with a natural sorting as suggested by Rui and Gregor – Cath Sep 22 '20 at 14:08

1 Answers1

0

Why not tray to generate the list of file names another way? If the filenames are sequential, starting at 1 and incrementing by 1, then you can try the following. It does not require you to know the number of the files ahead of time, and will continue working even when new files are added.

list_candidates <- sapply(1:length(dir(pattern = "[[:digit:]].csv")), 
                          function(x) paste0(x, ".csv"))
Ben Norris
  • 5,639
  • 2
  • 6
  • 15
  • 1
    `paste` is vectorized, so this is a long way to write `paste0(1:length(dir(...)), ".csv")`. No `sapply` or anonymous function needed. – Gregor Thomas Sep 22 '20 at 13:52
  • Yes I already had a solution with paste. However, the csv numbers start with zero and it could be that sometimes there is no zero or some of the nubmers are missing. Therefore, I thought there would be a different way of defining [[:Digit:]] sucht that it detects which files are in the Folder and orders them numerically. – R_hub Sep 22 '20 at 14:11