I'm using Rcrawler to crawl a vector of urls. For most of them it's working well, but every now and them one of them doesn't get crawled. At first I was only noticing this on https:// sites, which was addressed here. But I'm using version 0.1.7, which is supposed to have https:// capability.
I also found this other user who is having the same problem, but with http:// links as well. I checked on my instance and his websites didn't crawl properly for me either.
Here's what it I get when I try to crawl one of these sites:
>library(Rcrawler)
>Rcrawler("https://manager.submittable.com/beta/discover/?page=1&sort=")
>In process : 1..
Progress: 100.00 % : 1 parssed from 1 | Collected pages: 1 |
Level: 1
+ Check INDEX dataframe variable to see crawling details
+ Collected web pages are stored in Project folder
+ Project folder name : manager.submittable.com-191922
+ Project folder path : /home/anna/Documents/Rstudio/Submittable/manager.submittable.com-191922
Any thoughts? Still waiting for a reply from the creator.