44

I have a string that looks like:

str<-"a\f\r"

I'm trying to remove the backslashes but nothing works:

gsub("\","",str, fixed=TRUE)
gsub("\\","",str)
gsub("(\)","",str)
gsub("([\])","",str)

...basically all the variations you can imagine. I have even tried the string_replace_all function. ANY HELP??

I'm using R version 3.1.1; Mac OSX 10.7; the dput for a single string in my vector of strings gives:

dput(line)
"ud83d\ude21\ud83d\udd2b"

I imported the file using readLines from a standard .txt file. The content of the file looks something like: got an engineer booked for this afternoon \ud83d\udc4d all now hopefully sorted\ud83d\ude0a I m going to go insane ud83d\ude21\ud83d\udd2b in utf8towcs …

Thanks.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Tavi
  • 2,668
  • 11
  • 27
  • 41
  • 3
    related http://stackoverflow.com/questions/11806501/backslash-in-r-string – TooTone Aug 21 '14 at 10:50
  • @TooTone it is not quite the same problem. I've worked out dealing with double slashes, but I couldn't find any solutions to single slashes at all. – Tavi Aug 21 '14 at 12:30
  • 1
    @maryam `str<-"this\is\my\string"` can't be read into R because `\i`, `\m` and `\s` are stored letters in regex, so your example isn't reproducible (and confusing). You could change it to something like `str <- "a\f\r"` – David Arenburg Aug 21 '14 at 12:59
  • @DavidArenburg okay i'll do that – Tavi Aug 21 '14 at 13:06
  • @DavidArenburg That doesn't contain any backslashes either. You need to escape them if you're inputting them from the keyboard: `str <- "a\\f\\r"` – Hong Ooi Aug 21 '14 at 13:10
  • @DavidArenburg I've deleted my answer which addressed the original version of the question. – TooTone Aug 21 '14 at 13:13
  • 1
    @TooTone, seems like third exactly the same answer was just posted... – David Arenburg Aug 21 '14 at 13:15
  • @HongOoi I'm not sure about your edit to the post. I think we should leave it to the OP to clarify whether it's "this\\is\\my\\string" or "a\f\r" as David suggested. – TooTone Aug 21 '14 at 13:17
  • I've rolled it back to the original. – Hong Ooi Aug 21 '14 at 13:20
  • 1
    @HongOoi its like David suggested "a\f\r" all I want to do is remove the darn single backslashes – Tavi Aug 21 '14 at 13:23
  • @maryam You have to be careful. If you want to have a string that contains a backslash, you have to type _two_ backslashes, because \ is a special character to the R parser. For example `str <- "a\\f\\r"` contains the string `'a\f\r'` (two single backslashes). Conversely, `str <- "a\f\r"` contains _no_ backslashes: it has 3 characters, `a`, `\f` and `\r`. – Hong Ooi Aug 21 '14 at 13:25
  • @HongOoi but I don't want a string that contain backslashes in the first place. I just got a messed up text dataset that I'm trying to clean up and it has all sords of weird characters in it. I take it there is NO way to deal with single backslashes then? – Tavi Aug 21 '14 at 13:35
  • 1
    It sounds like your problem is not actually backslashes, but mangled text encoding. If R encounters a character it doesn't know how to print, you'll see lots of backslashes over the place, representing hex codes. Post an actual sample of your code and data, using `dput`. – Hong Ooi Aug 21 '14 at 13:39
  • @HongOoi alright, but what is dput? sorry – Tavi Aug 21 '14 at 15:54
  • See `?dput`. It prints something that you can cut and paste into the edit window here on StackOverflow, that will show R's internal representation of your data. – Hong Ooi Aug 21 '14 at 16:00
  • @HongOoi > dput(line) gives "ud83d\ude21\ud83d\udd2b" where line is the name of my string – Tavi Aug 21 '14 at 16:04
  • Put that into your question, along with the code you used to read the file and the OS you're using. Each `\uXXXX` represents a Unicode character which R doesn't know how to handle in your locale/encoding. – Hong Ooi Aug 21 '14 at 16:08
  • Can you also post a fragment of the file you're reading, and the code you used to read it with? – Hong Ooi Aug 21 '14 at 16:19
  • It seems like you have lots of gebberish there, try `gsub("[^A-Za-z0-9]", "", "ud83d\ude21\ud83d\udd2b")` maybe? – David Arenburg Aug 21 '14 at 16:27
  • @HongOoi there you go. thanks – Tavi Aug 21 '14 at 16:35
  • @DavidArenburg it works david but only the first part of the string is returned "ud83d" – Tavi Aug 21 '14 at 16:37
  • `utf8towcs`... you're not working with funky foreign languages are you? – Hong Ooi Aug 21 '14 at 16:38
  • Yeah that looks like an encoding issue to me. – Dason Aug 21 '14 at 16:41
  • Can you upload the file somewhere? It looks like whatever editor you're pasting it from has already turned the weird characters into `\xxxx` form. – Hong Ooi Aug 21 '14 at 16:42
  • @HongOoi haha noooo its all english :) actually david's idea to try to substitute everything besides the standard alphabets has the potential to work but for now its not doing exactly what its supposed to do – Tavi Aug 21 '14 at 16:42
  • Right, that's because each `\uXXXX` is actually _one_ character. – Hong Ooi Aug 21 '14 at 16:51
  • @HongOoi one character? oh yeah, now I see… I guess the problem is solved then, I'll use gsub("[^A-Za-z0-9]", "", "ud83d\ude21\ud83d\udd2b") thanks a lot Hong do you know how to mark a comment as answer? so perhaps others shall find it useful? – Tavi Aug 21 '14 at 17:07
  • Like I said - it's an encoding issue ;) – Dason Aug 21 '14 at 18:31
  • @Dason yes it is, glad its over now been on this for the past 6 hours. thanks to everyone that contributed – Tavi Aug 21 '14 at 18:34

5 Answers5

20

One quite universal solution is

gsub("\\\\", "", str)

Thanks to the comment above.

JelenaČuklina
  • 3,574
  • 2
  • 22
  • 35
16

When inputting backslashes from the keyboard, always escape them.

str <-"this\\is\\my\\string"    # note doubled backslashes -> 'this\is\my\string'
gsub("\\", "", str, fixed=TRUE) # ditto

str2 <- "a\\f\\r"               # ditto -> 'a\f\r'
gsub("\\", "", str2, fixed=TRUE)# ditto

Note that if you do

str <- "a\f\r"

then str contains no backslashes. It consists of the 3 characters a, \f (which is not normally printable, except as \f, and \r (same).

And just to head off a possible question. If your data was read from a file, the file doesn't have to have doubled backslashes. For example, if you have a file test.txt containing

a\b\c\d\e\f

and you do

str <- readLines("test.txt")

then str will contain the string a\b\c\d\e\f as you'd expect: 6 letters separated by 5 single backslashes. But you still have to type doubled backslashes if you want to work with it.

str <- gsub("\\", "", str, fixed=TRUE)  # now contains abcdef

From the dput, it looks like what you've got there is UTF-16 encoded text, which probably came from a Windows machine. According to

it encodes glyphs in the Supplementary Multilingual Plane, which is pretty obscure. I'll guess that you need to supply the argument encoding="UTF-16" to readLines when you read in the file.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • 1
    Did you just adopt the question to your answer or am I missing something? She has `str<-"a\f\r"` string, not `str <-"this\\is\\my\\string"` – David Arenburg Aug 21 '14 at 13:17
  • 1
    The original, before your edits, was `str <- "this\is\my\string"`. This isn't something you can enter into R from the keyboard. If it was meant to represent the string `"this\is\my\string"`, you have to escape the backslashes. – Hong Ooi Aug 21 '14 at 13:19
  • 1
    Actually I didn't edit it to `str <- "a\f\r"`, the OP did. The error was cause not because of the backslashes but because of the `\i`, `\m` and `\s` – David Arenburg Aug 21 '14 at 13:20
  • 1
    True, but it was on your suggestion. – Hong Ooi Aug 21 '14 at 13:21
  • 1
    Because the error wasn't caused due to backslashes, but due to stored letters, so her example wasn't reproducible. You have 10K rep, you can see deleted posts, don't you find any similarity between your answer and the other two deleted? – David Arenburg Aug 21 '14 at 13:21
  • please clues anyone? I've wasted so much time trying to do this – Tavi Aug 21 '14 at 13:26
  • @DavidArenburg The deleted answers would have worked fine, there was no reason to delete them. – Hong Ooi Aug 21 '14 at 13:29
  • 2
    If you adjust the question to the answer (like you did), they would work fine, yes. – David Arenburg Aug 21 '14 at 13:30
  • 1
    I didn't adjust the question, which is "how to replace a single backslash". All the answers do exactly this; or they did, until people tried to replace a nonexistent backslash.... I can't undelete an answer, but martin and tootone are free to undelete if they wish. – Hong Ooi Aug 21 '14 at 13:42
  • Has this ever been resolved? @HongOoi here is a reproducible example that your solution does not fix: `tst <- "[179,\"3 min\"]"` `gsub("\\", "", tst, fixed=TRUE)` – jon Oct 01 '19 at 19:11
  • @jon once you understand how many backslashes there are in your string, all will become clear – Hong Ooi Oct 02 '19 at 00:17
  • This does not work in R 4.X . There's still a mess with character strings getting interpreted, and "raw" data maniplulations such as the new `r"('this\string')"` – Carl Witthoft Feb 17 '22 at 14:30
  • @CarlWitthoft once you understand how many backslashes there are in your string, all will become clear – Hong Ooi Feb 17 '22 at 14:41
  • @HongOoi I do understand, e.g. if I have `this \f has none` I can search for `\\\\f` . But I cannot search purely for "\\\\"because R lumps `\f` as a single special character when interpreting a character string. What the R-lang needs is, maybe, a new class "literally_characters" so that nothing gets handled by the console interpreter. – Carl Witthoft Feb 17 '22 at 15:26
  • @CarlWitthoft once you understand how many backslashes there are in your string, all will become clear – Hong Ooi Feb 17 '22 at 15:38
  • @HongOoi , Yes, I know that the `\f` is a single character and that the console does not "believe" there are any backslash characters in my character string. The point is that we shouldn't have to search for "\\\\f" and "\\\\g" and "\\\\h" ad infinitum. So, backing up: how do we handle a character string **that we've been given from elsewhere** which has single backslashes that we have to modify? There's no guarantee it was read from a text file (thus bypassing the console parser) – Carl Witthoft Feb 17 '22 at 16:03
  • @CarlWitthoft "read from a text file"? What does that have to do with the price of tea in China? – Hong Ooi Feb 18 '22 at 15:09
  • @HongOoi if you try it, you'll see that R's text reader preserves the backslash as a separate character – Carl Witthoft Feb 18 '22 at 15:54
  • @CarlWitthoft and? – Hong Ooi Feb 18 '22 at 22:13
13

Since there isn't any direct ways to dealing with single backslashes, here's the closest solution to the problem as provided by David Arenburg in the comments section

gsub("[^A-Za-z0-9]", "", str) #remove all besides the alphabets & numbers
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Tavi
  • 2,668
  • 11
  • 27
  • 41
10

This might be helpful :)

require(stringi)
stri_escape_unicode("ala\\ma\\kota")
## [1] "ala\\\\ma\\\\kota"
stri_unescape_unicode("ala\\ ma\\ kota")
## [1] "ala ma kota"
bartektartanus
  • 15,284
  • 6
  • 74
  • 102
3

As of R 4.0.0, you can now use raw strings to avoid confusion with backlashes, just use the following syntax: r"(your_raw_expression)" (parentheses included):

str<-r"(ud83d\ude21\ud83d\udd2b)" #Equivalent of "ud83d\\ude21\\ud83d\\udd2b"
gsub(r"(\\)", "", str)
# [1] "ud83dude21ud83dudd2b"
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 2
    The only problem here is: what if I have a string `mystring <- 'ud83d\ud21'` . I can't insert that inside the `r"(` construct because nothing gets interpreted inside that. – Carl Witthoft Feb 17 '22 at 14:33