4

I have some experience in Python, but I have never used try & except functions to catch errors due to lack of formal training.

I am working on extracting a few articles from wikipedia. For this I have an array of titles, a few of which do not have any article or search result at the end. I would like the page retrieval function just to skip those few names and continue running the script on the rest. Reproducible code follows.

import wikipedia
# This one works.
links = ["CPython"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)

#The sequence breaks down if there is no wikipedia page.
links = ["CPython","no page"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)

The library running it uses a method like this. Normally it would be really bad practice, but since this is just for a one-off data extraction, I am willing to change the local copy of the library to get it to work. Edit I included the complete function now.

def page(title=None, pageid=None, auto_suggest=True, redirect=True, preload=False):
  '''
  Get a WikipediaPage object for the page with title `title` or the pageid
  `pageid` (mutually exclusive).

  Keyword arguments:

  * title - the title of the page to load
  * pageid - the numeric pageid of the page to load
  * auto_suggest - let Wikipedia find a valid page title for the query
  * redirect - allow redirection without raising RedirectError
  * preload - load content, summary, images, references, and links during initialization
  '''
  if title is not None:
    if auto_suggest:
      results, suggestion = search(title, results=1, suggestion=True)
      try:
        title = suggestion or results[0]
      except IndexError:
        # if there is no suggestion or search results, the page doesn't exist
        raise PageError(title)
    return WikipediaPage(title, redirect=redirect, preload=preload)
  elif pageid is not None:
    return WikipediaPage(pageid=pageid, preload=preload)
  else:
    raise ValueError("Either a title or a pageid must be specified")

What should I do to retreive only the pages that do not give the error. Maybe there is a way to filter out all items in the list that give this error or an error of some kind. Returning "NA" or similar would be fine with pages that don't exist. Skipping them without notice would be fine too. Thanks!

puslet88
  • 1,288
  • 15
  • 25

2 Answers2

5

The function wikipedia.page will raise a wikipedia.exceptions.PageError if the page doesn't exist. That's the error you want to catch.

import wikipedia
links = ["CPython","no page"]
test=[]
for link in links:
    try:
        #try to load the wikipedia page
        page=wikipedia.page(link, auto_suggest=False)
        test.append(page)
    except wikipedia.exceptions.PageError:
        #if a "PageError" was raised, ignore it and continue to next link
        continue

You have to surround the function wikipedia.page by a try block, so I'm afraid you can't use list comprehension.

Mel
  • 5,837
  • 10
  • 37
  • 42
  • It's too early, @puslet88 this should be what you need. – cssko Nov 24 '15 at 15:37
  • Thanks for your help, @tmoreau and @cssko, really appreciate it. I think I may be missing some point. This code runs without errors, but because `auto_suggest = True` suggests "Main page" instead of "no page". If I turn auto_suggest off, then it still breaks. – puslet88 Nov 24 '15 at 15:39
  • How does it break ? I just removed `auto_suggest` because I was lazy to type it, I have no idea was the function actually does. – Mel Nov 24 '15 at 15:41
  • Yes, sorry, I included the whole function. And unfortunately I have to admit that I am only 90% sure that this is the function to deal with. When the `auto_suggest=True` then the version with no error catching works too. Sorry about the confusion. – puslet88 Nov 24 '15 at 15:43
  • Yes, thank you so much, this fixes both answers! Now the catch and except functions also are somewhat more clearer to me. Thanks again! – puslet88 Nov 24 '15 at 15:51
3

Understand that this will be bad practice, but for a one off quick and dirty script you can just:

edit: Wait, sorry. I've just noticed the list comprehension. I'm actually not sure if this will work without breaking that down:

links = ["CPython", "no page"]
test = []
for link in links:
    try:
        page = wikipedia.page(link, auto_suggest=False)
        test.append(page)
    except wikipedia.exceptions.PageError:
        pass
test = [testitem.content for testitem in test]
print(test)

pass Tells python to essentially to trust you and ignore the error so that it can continue on about its day.

puslet88
  • 1,288
  • 15
  • 25
cssko
  • 3,027
  • 1
  • 18
  • 21
  • 1
    Thanks, I'll keep in mind that this is bad practice. The current code did not fix it for me though. I still get `wikipedia.exceptions.PageError: Page id "no page" does not match any pages. Try another id!` Could be a version issue, I'm using Python 3.5.0. – puslet88 Nov 24 '15 at 15:27
  • 1
    I would advice against putting things in your `try` block that won't fail anyway. Like assigns. Also, the IndexError is caught, and a PageError is then thrown, so you can't catch the IndexError 'higher' up. – Noxeus Nov 24 '15 at 15:32
  • 1
    A try block is not bad practice, as long you only except specific errors. `except: ` would skip any error and is a bad practice. Anyway the problem here is not an `IndexError`. – Mel Nov 24 '15 at 15:33