How to replace '+' using gsub() function in R

Question

I'm trying to remove the '+' character present inside one of the string element of a data frame. But I'm not able to find a way out of it.

Below is data frame.

txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L, 
            5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment", 
            "poli+tician", "politician"), class = "factor")), .Names = c("ID", 
            "Var1"), class = "data.frame", row.names = c(NA, -9L))
#  ID   Var1
#  1    government
#  2    government
#  3    government
#  4    government
#  5    poli+tician
#  6    politician
#  7    politician
#  8    parliament
#  9    parliment

I tried two ways, neither of them gave the expected results:

Way1

txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia"  "oliiia" 
# [8] "arliame" "arlime"

I don't understand what's wrong here. I want the '+' characters to be replaced with no value for the 5th element alone, but all the elements are edited as above.

Way2

txtdf<-gsub("*//+","",txtdf)
# [1] "government"  "government"  "government"  "government"  "poli+tician"
# [6] "politician"  "politician"  "parliament"  "parliment"

Here there is no change at all. What I think I've tried is, i tried to escape the + character using double slashes.

Or just put it in a character class : `"[+]"`. Since `+` (1 or more) has no special meaning in a character class it then doesn't need to be escaped. — LukStorms, May 14 '17 at 15:57
or use `fixed`argument : `gsub("+", "", txtdf$varname, fixed=TRUE)` — user2957945, May 14 '17 at 16:00
Note: Please always add code to your question to construct the data (frame) that the answer shall operate on to make it easier for us to answer your question (at SO often called a "minimal reproducible example - MRE). THX :-) — R Yoda, May 14 '17 at 16:08
The solution is trivial, but your attempts are interesting. Are you trying to remove any punctuation or just plus signs? — Wiktor Stribiżew, May 14 '17 at 16:13
@Rahul - look at my stupidity I was using a forward slash, this backward slash works. — Dileep Guntamadugu, May 18 '17 at 03:18
@LukStorms - Yes it works using the character class, Another way i'd learnt here. — Dileep Guntamadugu, May 18 '17 at 03:23
@RYoda - Copied, would do it for sure in my future questions. — Dileep Guntamadugu, May 18 '17 at 03:24
@WiktorStribiżew - was trying to remove all, punctuations, plus signs. — Dileep Guntamadugu, May 18 '17 at 03:24

R Yoda · Accepted Answer · 2017-05-14T16:10:00.213

3

Simply replace it with fixed = TRUE (no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name:

txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf

gives

          job
1  government
2 poli+tician
3  parliament

Now replace the "+":

txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf

The result is:

         job
1 government
2 politician
3 parliament

edited May 14 '17 at 16:10

answered May 14 '17 at 16:04

R Yoda

8,358
2
50
87

Another argument as "fixed= TRUE" to make your regular expression a fixed one is a new learning! Thank you – Dileep Guntamadugu May 18 '17 at 03:27

score 3 · Answer 2 · answered May 14 '17 at 16:39

You need to escape your plus sign, "+" has a special meaning(it is a quantifier) when it comes to regex and hence can't be treated as a punctuation mark, From documentation: ?regex

"+" The preceding item will be matched one or more times.

To match these special characters you need to escape these so that their meaning could be taken literally and hence their special meaning doesn't get translated. In R you need two backslashes(\) to escape. So in your case this would be something like:

gsub("\\+","",df$job)

Running above will give you the desired result by removing all the plus symbols from your data.

So assuming your df is :

df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))

then your output will be :

> gsub("\\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"

The backslash escape character worked perfectly! Thank you!! — Dileep Guntamadugu, May 18 '17 at 03:25

How to replace '+' using gsub() function in R

2 Answers2

Linked