0

I want to change "[" too "(" in a a data.frame (class is string) but i get the following error:

Error in gsub("[", "(", df) : invalid regular expression '[', reason 'Missing ']''

Doing the revers works perfectly:

df <- gsub("]",")", df)

all "]" got replaced in the data.frame df

so in essence this is the problem

df <- gsub("[","(", df)

Error in gsub("[", "(", df) : invalid regular expression '[', reason 'Missing ']''

can anyone help to fix the code or is there an alternative function to gsub which can accomplish the same?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • Square brackets and parentheses are regex _metacharacters_ which have a special meaning when inside a regex pattern. So, you should escape them. I don't know why the second version is working though, this is interesting. – Tim Biegeleisen Jul 25 '19 at 14:03
  • 1
    When using gsub (or any other function that uses a pattern, like grep), you have to take into account whether that pattern includes metacharacters, and if so, escaping their 'normal' function by adding '\\' in front of it. – Hart Radev Jul 25 '19 at 14:03
  • This is more appropriate: https://stackoverflow.com/questions/27721008/how-do-i-deal-with-special-characters-like-in-my-regex – NelsonGon Jul 25 '19 at 14:36
  • Also: https://stackoverflow.com/questions/38828620/how-to-remove-strange-characters-using-gsub-in-r/50398057#50398057 – acylam Jul 25 '19 at 14:51

2 Answers2

3

The [ is. a metacharacter, so we may need either fixed = TRUE or escaping \\[

gsub("[", "(", df, fixed = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662
2

We can also use the Hexadecimal representation of the ASCII character [ by prefixing it with \\x:

gsub('\\x5B', '(', '[')
# [1] "("

Just a preference, but I find this to be more readable in cases where the metacharacter [ and ] is mixed with it's literal/escaped version. For example I find this:

gsub('[\\x5B\\x5D]+', '(', ']][[[', perl = TRUE)

more readable than these:

gsub('[\\]\\[]+', '(', ']][[[', perl = TRUE)
[1] "("

gsub('[][]+', '(', ']][[[', perl = TRUE)
[1] "("

gsub('[\\[\\]]+', '(', ']][[[', perl = TRUE)
[1] "("

especially when you have a long and complicated pattern.

Here is the ASCII table I used from http://www.asciitable.com/

enter image description here

The obvious disadvantage is that you have to lookup the hex code from the table.

acylam
  • 18,231
  • 5
  • 36
  • 45