Im trying to read head2head data from tennis abstract webpage in R using package XML.
I want the big h2h table at the bottom,
css selector: html > body > div#main > table#maintable > tbody > tr > td#stats > table#matches.tablesorter
I have tried following suggestions from scraping html into r data frame.
I believe the difficulty is caused by table within table
url = "http://www.tennisabstract.com/cgi-bin/player.cgi?p=NovakDjokovic&f=ACareerqqs00&view=h2h"
library(RCurl)
library(XML)
webpage <- getURL(url)
webpage <- readLines(tc <- textConnection(webpage)); close(tc) #doesnt have the h2h table
pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)
results <- xpathSApply(pagetree, "//*/table[@class='tablesorter']/tr/td", xmlValue) # gives NULL
tables <- readHTMLTable( url,stringsAsFactors=T) # has 4 tables, not the desired one
I'm new to html parsing, so please bear with.