I'm scraping data from web. I used readlines() but now I have to change it to getURL() and htmlTreeParse().
a <- getURL(URL)
b<-htmlTreeParse(a, encoding = "UTF-8")
Problem is that b$children$html$body returns null for me. Now I'm stuck at trying to get each line of parsed html into a vector.
I'll be thankful for every idea.
//edit
I am trying to scrape from this site
url<-"http://www.registeruz.sk/cruz-public/domain/accountingentity/show/1545622"
When I print var b code of the site looks readable and everything seems fine
//edit2
b$children$html['body']$body
seems closest to the solution
To be more clear, I would like to have the same output as after using readlines(). So each line of HTML is component of the vector
//final edit
b <- htmlTreeParse(url, useInternalNodes=TRUE)
html<-b["//body"][[1]]
html<-as(html,"character")
vectors<-strsplit(html,"\n")
This seems to created the same result, thanks everyone for your help
3
attr(,"class") [1] "XMLNodeList"` – Shawn Mehan Oct 24 '15 at 21:01Your support ID is: 17677329063826983315` which is again different. – Shawn Mehan Oct 24 '15 at 21:09