I have a very long list of websites that I'd like to scrape for its title
, description
, and keywords
.
I'm using ContentScraper
from Rcrawler
package, and I know it's working, but there are certain URLs that it can't do and just generate the error message below. Is there anyway that it can skip that particular URL instead of stopping the entire execution?
Error: 'NULL' does not exist in current working directory
I've looked at this, but I don't think it has any answer to it. Here is the code I'm using. Any advice is greatly appreciated.
Web_Info <- ContentScraper(Url = Websites_List,
XpathPatterns = c('/html/head/title', '//meta[@name="description"]/@content', '//meta[@name="keywords"]/@content'),
PatternsName = c("Title", "Description", "Keywords"),
asDataFrame = TRUE)