I'd like to ask a question on the issue I'm currently stuck with. When trying to scrape an HTML page (using RCurl), I encounter this error: "Error in curlMultiPerform(multiHandle): embedded nul in string". I read a lot about this type of error and advices on how to deal with it (including one from Duncan Temple Lang, the creator of RCurl package). But even after applying his advice (as follows) I am getting the same error:
htmlPage <- rawToChar(getURLContent(url, followlocation = TRUE, binary = TRUE))
doc <- htmlParse(htmlPage, asText=TRUE)
Am I missing something? Any help will be much appreciated!
Edit:
However, there's 2nd error I haven't mentioned in the original post. It occurs here:
data <- lapply(i <- 1:length(links),
function(url) try(read.table(bzfile(links[i]),
sep=",", row.names=NULL)))
The error: Error in bzfile(links[i]) : invalid 'description' argument
.
'links' is a list of files' FULL URLs, constructed as follows:
links <- lapply(filenames, function(x) paste(url, x, sep="/"))
By using links[i]
, I'm trying to refer to the current element of links list in an ongoing iteration of `lapply().
Second Edit:
Currently I'm struggling with the following code. I found several more cases where people advise exactly the same approach, which keeps me curious why it doesn't work in my situation...
getData <- function(x) try(read.table(bzfile(x), sep = ",", row.names = NULL))
data <- lapply(seq_along(links), function(i) getData(links[[i]]))