3

I'm trying to remove the '+' character present inside one of the string element of a data frame. But I'm not able to find a way out of it.

Below is data frame.

txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L, 
            5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment", 
            "poli+tician", "politician"), class = "factor")), .Names = c("ID", 
            "Var1"), class = "data.frame", row.names = c(NA, -9L))
#  ID   Var1
#  1    government
#  2    government
#  3    government
#  4    government
#  5    poli+tician
#  6    politician
#  7    politician
#  8    parliament
#  9    parliment

I tried two ways, neither of them gave the expected results:

Way1

txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia"  "oliiia" 
# [8] "arliame" "arlime" 

I don't understand what's wrong here. I want the '+' characters to be replaced with no value for the 5th element alone, but all the elements are edited as above.

Way2

txtdf<-gsub("*//+","",txtdf)
# [1] "government"  "government"  "government"  "government"  "poli+tician"
# [6] "politician"  "politician"  "parliament"  "parliment" 

Here there is no change at all. What I think I've tried is, i tried to escape the + character using double slashes.

KenHBS
  • 6,756
  • 6
  • 37
  • 52

2 Answers2

3

Simply replace it with fixed = TRUE (no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name:

txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf

gives

          job
1  government
2 poli+tician
3  parliament

Now replace the "+":

txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf

The result is:

         job
1 government
2 politician
3 parliament
R Yoda
  • 8,358
  • 2
  • 50
  • 87
3

You need to escape your plus sign, "+" has a special meaning(it is a quantifier) when it comes to regex and hence can't be treated as a punctuation mark, From documentation: ?regex

"+" The preceding item will be matched one or more times.

To match these special characters you need to escape these so that their meaning could be taken literally and hence their special meaning doesn't get translated. In R you need two backslashes(\) to escape. So in your case this would be something like:

gsub("\\+","",df$job)

Running above will give you the desired result by removing all the plus symbols from your data.

So assuming your df is :

df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))

then your output will be :

> gsub("\\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"
PKumar
  • 10,971
  • 6
  • 37
  • 52