I am using the popular word cloud library with source: https://github.com/jasondavies/d3-cloud
I am using a clone of this block: http://bl.ocks.org/blockspring/847a40e23f68d6d7e8b5
For my data, I would like to set the maximum number of words the word cloud takes. The cloud has some built in functions for rotation, font size, spiral method, ect. However, there do not appear to be any built in means for setting the max of words to be displayed.
I think it would be more computationally efficient to simply feed it a subset of the original word count. I didn't see any .sort
calls, so I'm not sure if the word_count object is sorted by frequency yet before it goes to cloud.js or not.
If cloud.js sorts the word_count object it accepts by frequency or tf-idf, or whatever it uses, then I would have to wait to return the top k words until after it has made the list, implying it still iterated through my whole text file.
I still think if I can display only the top k (top as in most frequent, excluding the grammar words found in common_words), lets say 20, I will at least speed up the visual (not sure about speeding up the actual algorithm).
If that was not clear, let me explain it using a visual approach. It seems that the more frequent a word appears, the bigger its font size, I think that is an intuitive way to grasp cloud.js, so the top k will be k of the largest font-size.
So can someone with experience in this kind of visualization tell me where to tweak the code for returning top k words and how?
Note: I had originally posted this question on the git hub page, but it was marked as off-topic, so I was advised to post here. My initial fear was that this would be marked as too vague for stack overflow, so I have since tried to make the question less abstract and provide as much information as I could. Please bear this in mind.
Thank you