0

When searching a paper using some online library, such as Springer, the returned result will also show the related concept automatically extracted from this paper as well as some knowledge relationship graph based on these concepts. The following is an screenshot of the search output.

I would like to know which kind of algorithms and software are able to generate this kind of output. Are there any open-source tools being able to do that?

enter image description here

raulk
  • 2,809
  • 15
  • 32
user785099
  • 5,323
  • 10
  • 44
  • 62
  • [Neo4J](http://stackoverflow.com/questions/tagged/neo4j) would be one. The following topic might be interesting: http://stackoverflow.com/questions/1000162/has-anyone-used-graph-based-databases-http-neo4j-org – Val Jan 15 '16 at 16:18
  • The [Cross Validated](https://stats.stackexchange.com/) community might be able to help. – raulk Jan 15 '16 at 17:12
  • Hi Val, thank you so much for sharing me this information, which is very useful. – user785099 Feb 09 '16 at 16:03

1 Answers1

0

The algorithm being used is K-Means. K-Means is an unsupervised clustering algorithm. Articles are clustered by topic. Some articles contain multiple topics, many of which are the same between article. Those shared topics are then branches emerging from the initial topic. SKLearn is a great library for Python that does clustering very well. R is also great for clustering. Hope this helps!

  • It is extremly unlikely that K-Means is used here. First of all K-Means (typically) has a one-to-one relationship. this is not the case in the example. Neither, for words, nor for documents. Second, k-means suffers of the *Concentration of Measure* and *Curse of Dimensionality* which makes it unsuited for text classification. – CAFEBABE Jan 17 '16 at 10:03