Sumbit queries on wikipedia through R

Question

I am trying to develop an R script that takes a string and submits it on wikipedia search box. After reaching the page of that string, the R-program should extract all the tables from the page. For example, if the string is Manchester United, the R-script should submit a query on Wikipedia that takes it to the Manchester united page and extract all the tables and convert them into data frames.

P.S: I have just begun to try out web scraping in R, so any help would be greatly appreciated.

So... what have you done and what is exactly the problem you are trying to solve? See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to get the feeling of what a good R question should be. This question is at the moment too broad and not a good fit for this site. — nico, Sep 04 '14 at 10:30
So are *you* trying to develop it or you just want SO users to develop it for you? Because I don't see anything here that indicates any effort on your end. — David Arenburg, Sep 04 '14 at 11:49

score 1 · Answer 1 · answered Sep 04 '14 at 11:45

This question will be closed as it is a bit broad currently but what you can do in a most basic fashion is to use the readHTMLTable function from the XML package. It is a useful utility function and will handle basic html tables.

appURL <- "http://en.wikipedia.org/wiki/Manchester United"
library(XML)
out <- readHTMLTable(appURL)
> head(out[[1]], 2)
V1                              V2   V3
1   Full name Manchester United Football Club <NA>
2 Nickname(s)               The Red Devils[1] <NA>

There maybe R packages that can utilize any API that may exist for wikipedia. A quick search yielded http://cran.r-project.org/web/packages/WikipediR/index.html for example.

Sumbit queries on wikipedia through R

1 Answers1

Linked