0

I am using RStudio on an AWS EC2 instance. I created a for loop in R, to scrape an authorized website to create dataframes based on certain conditions.

After some time (like 12 hours), it seems my EC2 instance freezes, as I cannot access to RStudio interface when I go on the url address that I use. The small dataframes that I create contain texts with 8 columns and only 1 row, so we are talking about very small amount of data volume here. In that matter I don't understand how this could be a data volume issue.

I contacted EC2 Amazon support but they don't provide support to understand data volume issues caused by R scripts, only regarding EC2 instances.

I am thinking it is more related to CPU than RAM but I am not really knowledgeable on this... Thanks !

ML_Enthousiast
  • 1,147
  • 1
  • 15
  • 39
  • We can't reproduce your problem. Probably, It's related with memory leakage in libcurl in R which is widely experienced. Check similar problem https://stackoverflow.com/questions/31999766/r-memory-issues-while-webscraping-with-rvest – GoGonzo Oct 19 '18 at 09:01
  • In your post it says that the issue was resolved with last version of Rvest, which I am using. – ML_Enthousiast Oct 19 '18 at 15:10
  • I tried to insert gc() at the end of my for loop but it did not change anything. – ML_Enthousiast Oct 20 '18 at 05:58
  • gc() won't work in this case. This answer will solve your problem https://stackoverflow.com/a/28838683/3495076. You have to restart R session after some safe number of loops. So basically you need to create loop in linux terminal `for i in 1 2 3 4 5 6 7 8 9 10 do Rscript web_crawler.R done`. The only trick you have to code is to pass latest loop number (i) to the next session (for example by saveRDF) and start next session from latest i. Idea is to split loop on n seperate Rsessions (here n=10). – GoGonzo Oct 20 '18 at 07:46
  • Wow... now I can see why so many people talk about BeautifulSoup... But do you know how I can check what is the real issue? Because the CPU is never reaching 100% and the RAM seems to be fine – ML_Enthousiast Oct 22 '18 at 08:05
  • No idea, sir. Good luck ;) – GoGonzo Oct 22 '18 at 08:50

0 Answers0