0

I am unable to remove backslash using R. Input:

text="tweets\302"

Tried solution:

gsub('[[:punct:]]',"",text)

Expected result:

tweets302

Actual Result: "tweets\302"

I tried following this answer:

Removing backslash from strings in R

What am I doing wrong?

Community
  • 1
  • 1
sanjay
  • 354
  • 2
  • 6
  • 27
  • `gsub(..., fixed = TRUE)` so that it's interpreted as a string literal? – Akhil Nair Apr 29 '17 at 08:54
  • Possible duplicate of [how to replace single backslash in R](http://stackoverflow.com/questions/25424382/how-to-replace-single-backslash-in-r) – Tonio Liebrand Apr 29 '17 at 08:58
  • @AkhilNair doesn't work. > gsub("[[:punct:]]","",text,fixed = TRUE) [1] "tweets\302" – sanjay Apr 29 '17 at 09:02
  • 1
    Are you actually entering `text="tweets\302"` (which is equivalent to `text <- "tweets\302"`)? If so, the problem is that the backslash in that definition is interpreted as the escape character. You probably meant to enter `text="tweets\\302"` instead. And you should not use `fixed=TRUE` in the `gsub` call. – user1310503 Apr 29 '17 at 09:42
  • I am reading from a file where it has that way. – sanjay Apr 29 '17 at 10:17
  • Do you mean you are reading the file with your eyes, or the computer is reading the file using some command like `source` or `readLines`? In any case, after reading from the file, what is displayed when you enter `text` and press Enter? If it is `[1] "tweets\\302"` then the string contains a non-escape backslash and your `gsub` call should work. If it is `[1] "tweets` followed by some strange symbol, then the backslash was interpreted as the escape character and the string does not contain a backslash. – user1310503 Apr 29 '17 at 12:05
  • > text [1] "tweets\302" – sanjay Apr 30 '17 at 08:18
  • How exactly was `text` created? For example, did you use `source` to run a script in which one line was `text="tweets\302"`? Also, what do you get when you enter `nchar(text)`? – user1310503 Apr 30 '17 at 09:24
  • I had used readLines to read a file with tweets. and then tried to clean it. >nchar(text) [1] 7 – sanjay Apr 30 '17 at 11:01
  • How **exactly** was text created? You need to give **full detail** so that another person can reproduce what you did. Then the explanation will probably become clear. nchar being 7 shows that \302 is a single character, not four characters. – user1310503 Apr 30 '17 at 12:50
  • I pulled data from twitter using twitter4j api using java and wrote it to a file. from it I read it into R using readLines. yes it seems that \302 is a character. – sanjay May 01 '17 at 17:59
  • Did the file contain just `tweets\302`, or did it contain `text="tweets\302"`? – user1310503 May 02 '17 at 06:46
  • Just tweets\302 – sanjay May 02 '17 at 07:20
  • OK. I made a text-file called myFile.txt that just contained `tweets\302`, and then did `x <- readLines("myFile.txt"); nchar(x)`, and the result was 10. (This happened whether the encoding was ANSI or UTF-8.) What do you get when you do exactly this? – user1310503 May 02 '17 at 13:04

0 Answers0