-1

I am attempting to utilise R for some basic text analytics.

I have a column containing complex data type. I wish to maintain a separate table that I can use to remove certain phrases from the 1st data column.

I have tried gsubfn but without any success.

For example

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")

Why does

x <- gsubfn(removefields,"",dirtydata)

not work?

Hoping for an output

c("JOHN ","@PETER","BOB 22","RUPERT ")
pogibas
  • 27,303
  • 19
  • 84
  • 117
  • please include the name of additional loaded packages. but you can try `gsub(paste(removefields, collapse = "|"),"",dirtydata)` – Roman Oct 16 '17 at 09:04
  • Possible duplicate of [How to replace multiple strings with the same in R](https://stackoverflow.com/questions/28285480/how-to-replace-multiple-strings-with-the-same-in-r) or [this one](https://stackoverflow.com/questions/24645390/r-remove-multiple-text-strings-in-data-frame) – Roman Oct 16 '17 at 09:13

4 Answers4

0

Try this.

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT | BODY CORPORATE")
x <- gsub(removefields, "", dirtydata)
AshOfFire
  • 676
  • 5
  • 15
0

This generalises for whatever you put into removefields and strips white spaces around strings to be removed:

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <- c("COURT","BODY CORPORATE")
removefields <- paste0("\\s+", removefields, "\\s+", collapse = "|")
x <- gsub(removefields, "", dirtydata)
Milan Valášek
  • 571
  • 3
  • 10
0

We can use tm package

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")

library(tm)
removeWords(dirtydata, removefields)

> removeWords(dirtydata, removefields)
[1] "JOHN "   "@PETER"  "BOB 22"  "RUPERT "
Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83
0

Please find below edited code using base functions of R

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE")
removefields <-c("COURT","BODY CORPORATE")
pastedFields = paste0(removefields,collapse = "|")
gsub(pastedFields,"",dirtydata)
  • Can you elaborate it more? I am assuming you are getting output in list format you excepting vector? If so, please put the line of code where you applied it on column of your data – Sai Prabhanjan Reddy Oct 16 '17 at 09:27