The results that Google shows is based on massive amount of data that i guess built on "what X who searched for Y also searched for", "what other people similar to X who also searched for Y searched for" and so on. In addition maybe there is some reliance on semantic information coming from Freebase.
On an initiative to understand what kind of properties Google shows in their infoboxes, i.e. Why when we search for France we get a card with map, flag, capital, population ... etc. amongst the hundreds of properties relate to France i created a "Knowledge Base Extractor " that is able to parse the Google infobox and expose the data as RDF using the Fresnel Vocabulary.
The Algorithm implemented is the following:
- Query DBpedia for all concepts (types) for which there is at least one instance that has a link to a Freebase ID
- For each of these concepts pick (n) instances randomly
- For each instance, issue a Google Search query:
- if an infobox is available -> scrap the infobox to extract the properties
- if no infoxbox is available, check if Google suggests "do you mean ... ?" and if so, traverse the link and look for an infobox
- if no infobox or correction is available, disambiguate the concept (type) used in the search query and check if an infobox is returned
- if Google suggests disambiguation in an infobox parse all the links in it -> it is best to find which suggestion maps to the current data-type we are using -> check the Freebase - DBpedia mappings
- Cluster properties for each concept
I also capture that "people searched for" section, but you might also want to tweak it a bit more.
Also note that you might want to check the CSS selectors for the infobox as Google changes them often (maybe auto-generated). This is done in the options.json
"knowledgeBox" : "#kno-result",
"knowledgeBox_disambiguate" : ".kp-blk",
"property" : "._Nl",
"property_value" : ".kno-fv",
"label" : ".kno-ecr-pt",
"description" : ".kno-rdesc",
"type" : "._kx",
"images" : ".bicc",
"special_property" : ".kno-sh",
"special_property_value" : "._Zh",
"special_property_value_link" : "a._dt"