0

I want to extract the "Match match by list" table from

http://stats.espncricinfo.com/ci/engine/player/50710.html?class=2;template=results;type=batting;view=match

I'm new to R so don't know much about extracting data from webpages. I used this code to extract the table.

fileUrl<- "http://stats.espncricinfo.com/ci/engine/player/50710.html?class=2;template=results;type=batting;view=match"
#load
sanga <-htmlTreeParse(fileUrl, useInternal=T)
sanga.data <-xpathSApply(sanga,"//tr[@class='data1']",xmlValue)

However I end up with a one column matrix where each column in the original table is represented as a row. I read the information in this thread but still cannot figure out how to get the data in a table format. Scraping html tables into R data frames using the XML package

Community
  • 1
  • 1
Rodrigo
  • 69
  • 5
  • 14
  • 2
    The `XML` package has a `readHTMLTable` function. You could use `readHTMLTable(sanga, which = 50)` or `readHTMLTable(sanga)$"Match by match list"`. – lukeA Jan 19 '15 at 18:43

1 Answers1

0

You'll need to work a bit with the column names (and prbly delete the NA 'spacer' column), but it's straightforward to get to the table you want with the proper XPath:

library(rvest)
library(magrittr)

pg <- html("http://stats.espncricinfo.com/ci/engine/player/50710.html?class=2;template=results;type=batting;view=match")

pg %>% 
  html_nodes(xpath="//tr[@class='data1']/../..") %>%  # get to a reasonable set of tables (there are many)
  extract2(2) %>%                                     # we want the second one
  html_table(header=TRUE, trim=TRUE) -> data          # there's a header and pls trim the blanks

str(data)
## data.frame':  397 obs. of  11 variables:
##  $ Bat1      : chr  "35" "85" "36*" "DNB" ...
##  $ Runs      : chr  "35" "85" "36" "-" ...
##  $ BF        : chr  "55" "116" "47" "-" ...
##  $ SR        : chr  "63.63" "73.27" "76.59" "-" ...
##  $ 4s        : chr  "4" "11" "3" "-" ...
##  $ 6s        : chr  "0" "0" "0" "-" ...
##  $           : logi  NA NA NA NA NA NA ...
##  $ Opposition: chr  "v Pakistan" "v South Africa" "v Pakistan" "v South Africa" ...
##  $ Ground    : chr  "Galle" "Galle" "Colombo (RPS)" "Colombo (SSC)" ...
##  $ Start Date: chr  "5 Jul 2000" "6 Jul 2000" "9 Jul 2000" "11 Jul 2000" ...
##  $           : chr  "ODI # 1603" "ODI # 1604" "ODI # 1608" "ODI # 1610" ...
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205