2

I'm working on a PHP project where I create a more readable version of a text transcript for a judicial inquiry, and one thing I'd really like to do is have photos depicting each speaker.

Some of them are public figures (I.e., well-known UK judges and lawyers; UK politicians), others are journalists, some are celebrities.

It seems like Wikipedia is the best thing to use for this (I may be wrong, however), however, I'm really unfamiliar with the MediaWiki API.

So, my questions:

  1. Is Wikipedia the best thing to use for this task? Or is there a database of headshots somewhere with a very wide variety of subjects? If the latter, where's its API documentation?
  2. If Wikipedia, what API call would I use for fetching an article's main image URL?
  3. Lastly, how would I translate a string like "SIR PAUL STEPHENSON" to how it's listed in Wikipedia, i.e., "Paul_Stephenson_(police_officer)"

Note that I'm aware special cases will come up where no photo on Wikipedia exists or there needs to be disambiguation -- I'm quite aware I'll have to deal with those on a per-case basis.

Thanks!

Community
  • 1
  • 1
aendra
  • 5,286
  • 3
  • 38
  • 57
  • The third one won't really be solvable seeing how Wikipedia has HUGE variance in its page titles... – Jon Egeland Mar 14 '12 at 00:48
  • @Jon -- Admitted, but think there's any way to do a search for the original string, then follow whichever link scores the highest relevancy? – aendra Mar 14 '12 at 00:50
  • 2
    Third one could be solvable using the search api. Operative word = could. – Cyclone Mar 14 '12 at 00:50

2 Answers2

1

Google images has a face filter:

https://www.google.com/search?tbm=isch&q=SIR+PAUL+STEPHENSON&tbs=itp:face

I'm not sure if you are allowed to use their API for this kind of stuff though, you need to read their TOS.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • WHAT. That's awesome. Will read the TOS, that's like a million times simpler than what I was proposing. – aendra Mar 14 '12 at 11:57
  • Darn, it seems that's against the ToS. See the point about automated requests at http://code.google.com/apis/errors. Would've been great if it wasn't, though! – aendra Mar 14 '12 at 17:34
  • @aendrew: Couldn't you make this process be initiated by an end-user? If not, I can think of http://www.freebase.com/view/m/0c53qn and http://www.facesaerch.com/f/sir+paul+stephenson, but I think it wouldn't be as easy or as powerful. – Alix Axel Mar 15 '12 at 02:11
  • facesaerch uses Google Image API also. What do you mean by "make this process initiated by an end-user"? Think "grab four images on page load and then cache them" would qualify? Now that I think of it, that really should be acceptable. I should apply for an API key and see if I can get it to work. – aendra Mar 16 '12 at 19:40
0

You can use the search api to find the most likely article for a name. AFAIK there is no sane API though to find the first image in the article (the images api will return the images in alphabetic order, and includes images from templates), so your best bet is to parse the HTML (the portrait is usually the first large image) or the wikitext (most infoboxes use a parameter called image). You can use the imageinfo api to get the image URL from the image page name.

All in all, you are probably better off with Flickr.

Tgr
  • 27,442
  • 12
  • 81
  • 118