I am currently working on a University project under the theme of "search-engine". For this purpose we were given access to a database of scientific publications (http://dblp.uni-trier.de)
It is a 2GB XML file which looks something like this:
<article key="GottlobSR96">
<author>Georg Gottlob</author>
<author>Michael Schrefl</author>
<author>Brigitte Röck</author>
<title>Extending Object-Oriented Systems with Roles.</title>
<pages>268-296</pages>
<year>1996</year>
<volume>14</volume>
<journal>TOIS</journal>
<number>3</number>
<url>db/journals/tois/tois14.html#GottlobSR96</url>
</article>
As you can see the "article"-tag contains various information such as author,title of the paper,year of publication. My job now is to implement a Java solution which takes search terms of different categories (author, university,title) as input and provides the user with additional information.
For example if you enter the name of a professor it should return data like his date of birth, the University he works at, number of publications, etc..
I suppose this would work using google api to find for a persons entry on the University homepage and then somehow parsing through the page to find the needed information. For Universities there should be a Wikipedia page.
I already tried using mediawiki api but couldn't figure out how to get only the specific information I want.(I could only get the intro paragraph)
I've never worked on a project of this scale so I don't really have a clue on how to implement foreign API's/libraries etc. into my own code. So i guess my question is:
How do i get specific information based on a google-search? May it be through wikipedia or otherwise.