Based on this answer you could use stringi
and wrap it around content_transformer()
to preserve the corpus structure:
corp <- tm_map(crude, content_transformer(
function(x) {
stri_replace_all_fixed(x, df$replace, df$with, vectorize_all = FALSE)
})
)
Or multigsub
from qdap
corp <- tm_map(crude, content_transformer(
function(x) {
multigsub(df$replace, df$with, fixed = FALSE, x)
})
)
Which gives:
> corp[[1]][1]
"Diamond Shamrock Corp said that\neffective today it had cut its
contract xprices for xcrude xoil by\n1.50 dlrs a barrel.\n The reduction brings its posted xprice for West Texas\nIntermediate to
16.00 dlrs a barrel, the copany said.\n \"The xprice reduction today was made in the light of falling\nxoil product xprices
and a weak xcrude xoil market,\" a company\nspokeswoman said.\n
Diamond is the latest in a line of U.S. xoil companies that\nhave
cut its contract, or posted, xprices over the last two
days\nciting weak xoil markets.\n Reuter"
You can then apply other tm
functions on the resulting corpus:
> DocumentTermMatrix(corp)
#<<DocumentTermMatrix (documents: 20, terms: 1269)>>
#Non-/sparse entries: 2262/23118
#Sparsity : 91%
#Maximal term length: 17
#Weighting : term frequency (tf)