I have a column in a data frame, old_df
.
A sample row looks like:
data
trying URL 'https://maps.googleapis.com/maps/api/streetview?&location=13.5146367326733,100.380686367492&size=8000x5333&heading=0&fov=90&pitch=0&key='Content type 'image/jpeg' length 59782 bytes (58 KB)
downloaded 58 KB
Using stopwords
, I have removed the words I do not want, and am left with:
data
?&13.5146367326733,100.380686367492
?&13.5162026732673,100.66581378616
stopwords = c('trying',
'URL',
"'",
'&',
'location=',
'https://maps.googleapis.com/maps/api/streetview',
'size=8000x5333',
'heading',
'=0&fov=90&pitch=0&key=',
'Content',
'type',
'image/jpeg',
'length',
'bytes',
'KB')
require('tm')
new_df <- as.data.frame(removeWords(old_df$data, stopwords))
However, ?&
remains in the data
column before the numbers (which I don't want). I try to include ?
, &
and ?&
in stopwords
, yet they remain. Any ideas how to delete them?
Indeed, when I include the above combinations within stopwords
, I get the error:
PCRE pattern compilation error 'quantifier does not follow a repeatable item' at '?|&|')\b'