2

I am having troubles running my code which was written under RStudio 1.3.959 after migrating to a new PC and installing RStudio 1.4.1717. The same error appears when running the code via base R (4.1.0). When using base R functions (grep, gregexpr, e.g. gregexpr("[:alpha:]+", "1234a")), there is no error message.

Code:

library(tidyverse)

data_files <- as.data.frame(list.files(data_folder)) 

data_files <- data_files %>%
  mutate(temp = data_files[,1]) %>%
  separate("temp",
           c("temp", "Trash"),
           sep = "\\.") %>%
  select(-"Trash") %>%
  separate("temp",
           c("run", "Trash"),
           sep = "[:alpha:]+", 
           remove = FALSE) %>%
  select(-"Trash") %>%
  separate("temp",
           c("Trash", "letters"),
           sep = "[:digit:]+") %>%
  select(-"Trash") %>%
  select("run", "letters") 

My data_folder contains csv files with name pattern (date-increment-letter.csv, e.g. 21021202a.csv)

Error message:

Error in gregexpr(pattern, x, perl = TRUE) : 
  invalid regular expression '[:alpha:]+'
In addition: Warning message:
In gregexpr(pattern, x, perl = TRUE) : PCRE pattern compilation error
    'POSIX named classes are supported only within a class'
    at '[:alpha:]+'

Reproducible example using dput:

data_files <- as.data.frame(list.files(icpms_folder))  
dput(head(data_files)) 

structure(list(list.files(icpms_folder) = c("21021202a.csv", 
                                            "21021202b.csv", 
                                            "21021202c.csv", 
                                            "21021203a.csv", 
                                            "21021203b.csv", 
                                            "21021203c.csv")), 
                 row.names = c(NA, 6L), class = "data.frame")

Could you point me what is missing in my fresh installation, please?

Thank you in advance!

  • Just curious: Why do you seperate and then drop the `Trash`-column instead of simply extracting what you want? – Martin Gal Jul 20 '21 at 15:51
  • 1
    Please share a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a small example of your data used, best using `dput(head(YOURDATA))`. Edit your question and put the `structure(...)`-output there. – Martin Gal Jul 20 '21 at 15:55
  • 3
    Finally: Try replacing `sep = "[:alpha:]+"` by `sep = "[[:alpha:]]+"`. – Martin Gal Jul 20 '21 at 15:59
  • FYI, you could just do `separate(df, temp, c("temp", NA), sep = "\\.")` and that will automatically delete the second column. But as Martin suggested, there is surely a more efficient way of achieving what you want if you could provide a reproducible data set. – Phil Jul 20 '21 at 16:19
  • 1
    @MartinGal, I believe that's the answer, please post? I have no idea why this would have worked before (maybe it wasn't really?) – Ben Bolker Jul 20 '21 at 16:23
  • 1
    @BenBolker Some expert on regex should post this with a proper explanation, which I actually can't give. – Martin Gal Jul 20 '21 at 16:27
  • 1
    @MartinGal: this part of code was done pretty fast and without second thought. It worked hundreds if not thousands of time to get the file list which I then passed to my function to actually read the data. Your solution with replacing `sep = "[:alpha:]+"` by `sep = "[[:alpha:]]+"` worked like a charm, thanks! I have no clue why the first variant worked on my previous installation. – alexander baranov Jul 20 '21 at 17:07
  • Reproducible example using dput: `> data_files <- as.data.frame(list.files(icpms_folder)) > dput(head(data_files)) structure(list(`list.files(icpms_folder)` = c("21021202a.csv", "21021202b.csv", "21021202c.csv", "21021203a.csv", "21021203b.csv", "21021203c.csv")), row.names = c(NA, 6L), class = "data.frame")` – alexander baranov Jul 20 '21 at 17:09
  • It doesn't matter now, but actually you can *edit* your question and put the `dput` there. Posting it as a comment is... well... not so good. – Martin Gal Jul 20 '21 at 18:30
  • Yes, I know it wasn't good but for some reason the edit button was missing and now I can see it. Thanks for help! – alexander baranov Jul 23 '21 at 07:28

1 Answers1

0

The answer to "why" is already in the error message: POSIX named classes are supported only within a class.

POSIX named classes are like [:digit:], [:alpha:], and so on.

By "class", the message author meant a character class, i.e. [...].

Put one inside of another:

sep = '[[:alpha:]]+'
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37