2

I want to get the first paragraph of a Wikipedia article. So i'm using wikitools.

from wikitools import wiki
from wikitools import api
from wikitools import page

wikiobj = wiki.Wiki("http://en.wikipedia.org/w/api.php")
wikipage = page.Page(wikiobj, title="Office_Space")
wikidata = wikipage.getWikiText(True)
print wikidata

Here is result of wikitools. In this result there are so many tags. I don't want all this.

I need only following part from article. (Copy paste from wikipedia). Is it possible? thanks. Or is there any another alternative available. Thank you so much.

   Office Space is a 1999 American comedy film satirizing work life in a typical 1990s software company. Written and directed by Mike Judge, it focuses on a handful of individuals fed up with their jobs portrayed by Ron Livingston, Jennifer Aniston, Gary Cole, David Herman, Ajay Naidu, and Diedrich Bader.
   The film's sympathetic depiction of ordinary IT workers garnered a cult following within that field, but also addresses themes familiar to white collar employees in general.
   Shot in Las Colinas and Austin, Texas, Office Space is based on Judge's Milton cartoon series. It was his first foray into live action film and second full length motion picture release.

2 Answers2

0

The template parser in wikipedia_utils (referenced in this article on how to scrape and parse Wikipedia) looks like it'll allow you to put everything wikitools returns into a Python data structure, from which you can extract just the bits you want.

Edit: You might also find Python library mwlib useful for this purpose, as described in this SO answer.

Community
  • 1
  • 1
Jonathan Deamer
  • 248
  • 1
  • 3
  • 15
0

Finally I have found script. This is in working condition. Thanks any ways :-D

https://github.com/Anorov/Imageboard-Spammer-Deluxe/blob/d735cc24468528bb6c6cd1a1447986e550478804/wikipedia.py