What is the best method to do location disambiguation for geonames data?
There are some scoring algorithm for geonames search, but they do not open source it and I'm not sure they are very sophisticated. (i.e. for soma, ca
it returns Soma lake in Canada
which haven't even wikipedia article, instead of very popular Soma Neirbohood in san francisco
)
There also some works I have found in google scholar, but they seems very shallow and similar with my heuristics like scoring by something(log(population) + 1000*hasWikipedia(article)+ isCity100+isCapital(10)
).
My domain in travel articles so my scoring function should provide most probable tourist places(cities, place of interest(Disneyland, colleseum, big ben)).
Do you know any important article in this field, or algorithms used in production by Google maps, yahoo, bing or even geonames?