I'm currently trying to scrape text from an HTML tree that I've parsed as follows:-
require(RCurl)
require(XML)
query.IMDB <- getURL('http://www.imdb.com/title/tt0096697/epdate') #Simpsons episodes, rated and ordered by broadcast date
names(query.IMDB)
query.IMDB
query.IMDB <- htmlParse(query.IMDB)
df.IMDB <- getNodeSet(query.IMDB, "//*/div[@class='rating rating-list']")
My first attempt was just to use grep on the resulting vector, but this fails.
data[grep("Users rated this", "", df.IMDB)]
#Error in data... object of type closure is not subsettable
My next attempt was to use grep on the individual points in the query.IMDB vector:-
vect <- numeric(length(df.IMDB))
for (i in 1:length(df.IMDB)){
vect[i] <- data[grep("Users rated this", "", df.IMDB)]
}
but this also throws the closure not subsettable error.
Finally trying the above function without data[]
around the grep
throws
Error in df.IMDB[i] <- grep("Users rated this", "", df.IMDB[i]) : replacement has length zero
I'm actually hoping to eventually replace everything except a number of the form [0-9].[0-9]
following the given text string with blank space, but I'm doing a simpler version first to get the thing working.
Can anyone advise what function I should be using to edit the text in each point on my query.IMDB vector