1

I have a list of regexps which are used to produce some graphs. I'd like to save the graphs with it's corresponding regexp in the filename. Example:

re <- 'foo\\w{3}bar'
# ... produce a graph here and now need a filename
not_save <- paste0("pefix ", re, ".suffix")

But re needs to be cleaned from everyting not allowed in filenames. I know, it's OS and filesystem related, but I think if it's a valid filename on Windows it's valid everywhere.

I can substitute bad characters with gsub():

not_save_enough <- gsub('[$^*|{}()/:]', '_', re, perl=TRUE)

But I don't know all bad chars and don't know how to replace \ and/or [ and ]. Substitute all bad chars by _ would be sufficient. Unfortunally things like

 gsub('\Q\\E', '_', "Numbers are \d", perl=TRUE)

arn't working even with perl = TRUE and produce

Error: '\Q' is an unrecognized escape in character string starting "'/\Q"

Is there a function like make_string_save_to_use_it_as_filename()?

How to substitude \, [ and ] and other regexp-meta-chars in strings?

dtrv
  • 693
  • 1
  • 6
  • 14
  • you need to double escape the `\Q` – Avinash Raj Dec 11 '14 at 13:42
  • @AvinashRaj Thanks! That helps! But do you know if there is a function like I mentioned in the question? Seems strange to prepare strings for usage as filenames by myself. – dtrv Dec 11 '14 at 14:16

2 Answers2

1

I think you want something like this,

> re <- 'foo\\w{3}bar'
> not_save_enough <- gsub('[$^*|{}\\[\\]()/:\\\\]', '_', re, perl=TRUE)
> not_save_enough
[1] "foo_w_3_bar"
> re <- 'foo\\w{3}bar[foo]foo(buz)kj^jkj$jhh*foo|bar/hjh'
> not_save_enough <- gsub('[$^*|{}\\[\\]()/:\\\\]', '_', re, perl=TRUE)
> not_save_enough
[1] "foo_w_3_bar_foo_foo_buz_kj_jkj_jhh_foo_bar_hjh"

In R regex, you need to escape backslash three more times to match a literal backslash.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

How to substitude \ [ ] and other metacharacters in strings?

If the idea here is to replace bad characters, you may consider the POSIX class [[:punct:]]. This POSIX named class in the ASCII range matches all non-controls, non-alphanumeric, non-space characters.

!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~

So if you're wanting to replace each instance with an underscore you could do ...

fn <- gsub('[[:punct:]]', '_', 'foo\\w{3}bar')
# [1] "foo_w_3_bar"

The use of \Q and \E ensures that any character between will be matched literally and not interpreted as a metacharacter by the regular expression engine. Also in R the delimiter /.../ and g (global) mode modifier syntax is invalid. Below is an example demonstrating the correct use:

x <- '[[[(((123]'
gsub('\\Q[[[(((\\E', '[', x, perl=T)
# [1] "[123]"

If you need to use modifiers, ensure perl=TRUE is turned on and use inline modifiers i.e. (?ismx)

hwnd
  • 69,796
  • 4
  • 95
  • 132
  • I like the idea of using `[[:punct:]]` and will go with that. Thanks! Ok, `/.../g` is invalid syntax, I'll correct my question to not confuse someone. As Avinash mentioned, double escaping `\Q` works (`\\Q...\\E`). @hwnd – dtrv Dec 11 '14 at 15:29