0

suppose I have a string that has the following characters

"\"------------080209060700030309080805\""

And now I want to use gsub function in r to remove the "\ and \" part, and only keep the following characters:

"------------080209060700030309080805\"

Could anyone help me to figure out how should I do it properly ?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
tql66
  • 11
  • 1
  • 2
  • 1
    `"------------080209060700030309080805\"` as result looks like a typo: Do you really want the final backslash to be part of your result? Please edit your answer... – R Yoda Feb 15 '17 at 07:04
  • Please show us in plain text what the input and output should be. Don't worry about escaping in R. – Tim Biegeleisen Feb 15 '17 at 07:23
  • The input here is the exactly looks like this: "\"------------080209060700030309080805\"", this is actually from a email spam/ham data from a data science book. And I want to use gsub function to only grab the "------------080209060700030309080805"" part, which means I want to get rid of the slash here, but currently it seems i don't have a good way to do this. – tql66 Feb 17 '17 at 07:26

2 Answers2

5

Edit 1: Fixed bug (two backslashes required to create a backslash in a string):

s <- '\\"------------080209060700030309080805\\"'
s
gsub('\\"', "", s, fixed = TRUE)

results in

> s <- '\\"------------080209060700030309080805\\"'
> s
[1] "\\\"------------080209060700030309080805\\\""
> gsub('\\"', "", s, fixed = TRUE)
[1] "------------080209060700030309080805"

Please note that a single backslash in R is the escape code which is NOT part of the string:

> charToRaw('\\"')
[1] 5c 22
> charToRaw('\"')
[1] 22

Therefor you have to use two backslashes in the quoted string to create one backslash internally. If you print this string the backslash is escaped again which looks confusing:

> print('\\"')
[1] "\\\""

If you want to print the unescaped content of the string use cat instead of print:

> cat('\\"')
\"

For more see help in R: ?"'":

Character constants

Single and double quotes delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes.

Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.

Single quotes need to be escaped by backslash in single-quoted strings, and double quotes in double-quoted strings.

\n newline \r carriage return \t tab \b backspace \a alert (bell) \f form feed \v vertical tab \ backslash \ \' ASCII apostrophe ' \" ASCII quotation mark " ` ASCII grave accent (backtick) ` \nnn
character with given octal code (1, 2 or 3 digits) \xnn character with given hex code (1 or 2 hex digits) \unnnn Unicode character with given code (1--4 hex digits) \Unnnnnnnn Unicode character with given code (1--8 hex digits)

R Yoda
  • 8,358
  • 2
  • 50
  • 87
1
string <- "\\------------080209060700030309080805\\"
string <- gsub("^\\\\(.*)\\\\$", "\\1", string)

Notes: The pattern I used was ^\(.*)\$, which will match everything in between a beginning and ending backslash. This would only match strings therefore which both begin and end with backslash. Also, we use four backslashes (\\\\) to represent a literal backslash for the pattern in gsub(). We need to escape twice, once for R, and a second time for the regex engine.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360