1

How do I use PHP to get the first paragraph of any article from Wikipedia through their MediaWiki API?

I am open to all suggestions. Most probably CURL or XML will come in handy.

Yotam Omer
  • 15,310
  • 11
  • 62
  • 65
jaguarhaus
  • 247
  • 1
  • 5
  • 13
  • What makes you think this is trivially possible? As far as I'm aware, there's nothing in the API about first paragraphs... – lonesomeday Feb 21 '12 at 16:31
  • The problem you have isn't an issue with Wikipedia, but working with the result you get back. You should create a new question with the example page text/data, asking how to parse out just the first paragraph. – Brad Feb 21 '12 at 16:40

2 Answers2

2

You can use the API as so:

http://en.wikipedia.org/w/api.php?action=parse&page=Stack_overflow&format=xml&prop=text&section=0

This will return an xml file with structure:

<?xml version="1.0"?>
<api>
  <parse title="Article Title">
    <text xml:space="preserve">Text you wanted goes here</text>
  </parse>
</api>

Note the variables: page=Article_Title_Goes_Here format=xml prop=text

Yotam Omer
  • 15,310
  • 11
  • 62
  • 65
  • is there a way to skip all the extra content and just get the first intro para of the page. i seem to be geting the image and the right side tabular details etc – Harsha M V Jul 01 '14 at 19:30
-3

I would use file_get_contents('http://wikipedia.com/'.$rest_of_url)

Then just use string parsing to select everything form

to

http://php.net/manual/en/function.substr.php
cyrusv
  • 247
  • 3
  • 15
  • it escaped: use string parsing to select everything between the first `

    ` and `

    ` using `substr`
    – cyrusv Feb 21 '12 at 17:29