1

I am trying to retrieve the average user rating from times of an Indian movie Hindi review page, but I am unable to do so it is just reading nothing. I am thinking that the reason is because of it is dynamically loading.

This is the code I have written:

library('rvest')
avg_readerrating<-c()
v2<-"http://timesofindia.indiatimes.com/entertainment/hindi/movie-
reviews/moviearticlelistdatewise1/2742919.cms?query=*:*&startdate=2015-01-
01&enddate=2015-01-31&sectionid=2742919" 

url<-gsub("monthR",month[i],gsub("dateR",date[i],v2))
download.file(url, destfile = 'H:/whatever.html')
web <- read_html('H:/whatever.html')

avg_readerrating_html<-html_nodes(web,xpath='//*
[@id="articlenew"]/div/div[2]/div[2]/div/span[2]')
avg_readerrating_Tab<-html_text(avg_readerrating_html)
avg_readerrating<-c(avg_readerrating,avg_readerrating_Tab) 

After I run this code it is just showing "" in the output. Please answer me how I scrap the dynamic data from website.

lmiguelvargasf
  • 63,191
  • 45
  • 217
  • 228
hari Kotha
  • 11
  • 1
  • Print what `url` is after the substitution. Can you `curl` that address? – beroe May 23 '17 at 05:31
  • The construction of `v2` include line breaks. This will cause problems. Also, the `month[i]` and `date[i]` calls will fail as there is no information in your post defining `month`, `date`, or `i`. Please read [this post on creating reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Peter May 23 '17 at 21:36

1 Answers1

0

If you are looking to parse html tables from the website, try XMLpackage as :

library(XML)
library(RCurl)
library(rlist)

v2<-"http://timesofindia.indiatimes.com/entertainment/hindi/movie-reviews/moviearticlelistdatewise1/2742919.cms?query=*:*&startdate=2015-01-01&enddate=2015-01-31&sectionid=2742919" 

theurl <- getURL(v2,.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)

This way you can get the retrieve the user ratings table from the website. Also, you can follow this for other alternatives.

parth
  • 1,571
  • 15
  • 24