10

I was looking for a way to get the pronunciation of any given word by querying an API of some sort. As Wiktionary comes in handy to find pronunciations of words I was trying to use their API, but how do I get the pronunciation of a specific word?

It seems their API only allows getting the entire Wiki article.

deed02392
  • 4,799
  • 2
  • 31
  • 48
baik
  • 993
  • 2
  • 14
  • 21

3 Answers3

7

Wiktionary doesn't have an API of its own. MediaWiki, the software the Wiktionary runs on does have an API but it is completely unaware of the structure and content of Wiktionary.

The best you can do is use the MediaWiki API to find the wiki page for the word you want, then look at the table of contents. If the table of contents has a language section for the language you want and within that there is a Pronunciation section, then use another API call to get the wikitext of that section which you will have to parse yourself. It may well use or not different templates on different words since Wiktionary is constantly evolving.

There are also mailing lists for Wiktionary and for MediaWiki API.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
  • 1
    Thanks for that API; I'd been XML-parsing the dictionary entry pages in my application. – Tortoise Nov 05 '12 at 01:40
  • 1
    @Tortoise: You're welcome. It would probably be easier these days if there were a way to do jQuery-style selectors on the HTML. You can get the HTML of the whole page or a single section minus most of the boilerplate either with some URL parameters or via the API. – hippietrail Nov 05 '12 at 03:36
  • The "jQuery-style" was just to [mess with me](http://stackoverflow.com/questions/13225135/advantages-of-jquery), right? ;) – Tortoise Nov 05 '12 at 03:38
  • @Tortoise: Not really. I know [there are implementations of the DOM API in languages other than JavaScript](http://stackoverflow.com/questions/12006490) and I know jQuery's selection stuff is from a separate project called [Sizzle](http://sizzlejs.com/). So without knowing much more I'm just not ruling out the possibility that somebody may have ported some subset of this stuff to PHP, or made something different that functions in a similar way. Another possibility is if there is some interface in existence between PHP and node.js ... – hippietrail Nov 05 '12 at 03:46
  • You mean, like, SimpleXML? I don't follow. – Tortoise Nov 05 '12 at 03:47
  • @Tortoise: I don't know anything specific I only know that there are many things out there that I haven't heard about that you might be able to find with some searching. I'll keep a look out too though... Check out this old question which mentions a "phpQuery": [Is there a JQuery DOM manipulator/CSS selector equivalent class in PHP?](http://stackoverflow.com/questions/2722109) – hippietrail Nov 05 '12 at 03:49
5

You could build on wiktionary dbpedia an send a SPARQL query like the following one to their SPARQL endpoint:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wt:<http://wiktionary.dbpedia.org/terms/>

SELECT DISTINCT ?spell ?pronounce
WHERE { 
  ?spell rdfs:label "work"@en ;
            wt:hasLangUsage ?use .

  ?use dc:language wt:English ;
          wt:hasPronunciation ?pronounce .
}

In this case "work" is the word for which you want to look up the spelling.

EDIT:

A similar project is dbnary, which is more active and delivers more reliable results. You can use the SPARQL endpoint with the following query:

SELECT DISTINCT ?pronun
WHERE {
  ?form lemon:writtenRep "work"@en ;
        lexinfo:pronunciation ?pronun .
}
Karl Bartel
  • 3,244
  • 1
  • 29
  • 28
  • That SPARQL endpoint is currently a broken link. Do you know if it's just temporary or do you have an alternate link? I tried this query elsewhere with no results. I'm a fan of DBpedia but not very knowledgeable. – hippietrail Nov 05 '12 at 03:42
  • 2
    @hippietrail: The endpoint works fine for me. – Karl Bartel Nov 08 '12 at 20:54
  • OK I've moved from Seoul to Sydney and either it got fixed in that time or my location made a difference for some reason. I have noticed that the first letter of the first pronunciation is consistently missing though: "work" -> `"ɜː(r)k"@en`; "pork" -> `"ɔː(r)k"@en` – hippietrail Nov 09 '12 at 08:33
  • 2
    @hippietrail: I get four results for work. Three are the different pronunciations and one of them is the "Rhymes" entry from wiktionary, which is "missing" the first letter for obvious reasons. I don't know if listing the rhyme as hasPronunciation is a bug or the dbpedia people really consider it a pronunciation, but I expect the former. – Karl Bartel Nov 10 '12 at 09:54
  • 1
    Is there documentation somewhere for this API? For example I want to locate audio files with pronunciation. – reducing activity Feb 19 '15 at 05:20
  • @MateuszKonieczny: I don't see links to the audio files in either project. However, it should be possible to add them to dbnary without too much work. I assume the project would be happy about such a contribution. I would also like to have this information for my project [WikDict](http://www.wikdict.com). – Karl Bartel Feb 22 '15 at 13:26
2

Here is what I did for a similar situation.

  1. Visit Scraping Links With PHP. It will teach you how to scrape links using PHP. Please do not copy and paste but try to learn it.
  2. Now that we have our links we need to separate the audio (*.ogg) ones from the normal links. We need to use the pathinfo function in PHP. The officual documentation for pathinfo should be a good start.
  3. Create a XML out of the result.
  4. Deliver the content using Ajax or any other prefered way.

Or you can give "http://api.forvo.com/demo" a try. It looks promising.

I will not give you the full answer! Because it will not be fun any more. I hope it helps.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
sam
  • 117
  • 1
  • 13