0

I am trying to develop an R script that takes a string and submits it on wikipedia search box. After reaching the page of that string, the R-program should extract all the tables from the page. For example, if the string is Manchester United, the R-script should submit a query on Wikipedia that takes it to the Manchester united page and extract all the tables and convert them into data frames.

P.S: I have just begun to try out web scraping in R, so any help would be greatly appreciated.

Usman Khan
  • 49
  • 5
  • So... what have you done and what is exactly the problem you are trying to solve? See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to get the feeling of what a good R question should be. This question is at the moment too broad and not a good fit for this site. – nico Sep 04 '14 at 10:30
  • 1
    So are *you* trying to develop it or you just want SO users to develop it for you? Because I don't see anything here that indicates any effort on your end. – David Arenburg Sep 04 '14 at 11:49

1 Answers1

1

This question will be closed as it is a bit broad currently but what you can do in a most basic fashion is to use the readHTMLTable function from the XML package. It is a useful utility function and will handle basic html tables.

appURL <- "http://en.wikipedia.org/wiki/Manchester United"
library(XML)
out <- readHTMLTable(appURL)
> head(out[[1]], 2)
V1                              V2   V3
1   Full name Manchester United Football Club <NA>
2 Nickname(s)               The Red Devils[1] <NA>

There maybe R packages that can utilize any API that may exist for wikipedia. A quick search yielded http://cran.r-project.org/web/packages/WikipediR/index.html for example.

jdharrison
  • 30,085
  • 4
  • 77
  • 89