2

Based on this question How to create a word cloud from a corpus in Python?, I a did build a word cloud, using amueller's library. However, I fail to see how I can feed the cloud with more that one text sets. Here is what I have tried so far:

wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask,
               stopwords=STOPWORDS.add("said"))
wc.generate(set_of_words)
wc.generate("foo") # this overwrites the previous line of code
# but I would like this to be appended to the set of words

I can not find any manual for the library, so I have no idea about how to proceed, do you? :)


In reality, as you see here: Dictionary with array of different types as value in Python, I have this data structure:

category = {  "World news": [2, "foo bla content of", "content of 2nd article"],
              "Politics": [1, "only 1 article here"],
              ...
}

and I would like to append to the world cloud "foo bla content of" and "content of 2nd article".

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • 1
    Why not just append any words to the original set, and then generate that extended set? It'll be computationally expensive after a point, but is that a concern? – rabbit Jan 19 '16 at 21:51
  • @NBartley I have a dictionary, where every key has multiple values and every value is multiple words. I want to append all the words of the first 5 keys and I am not sure how to do what you are saying..I am new to Python. – gsamaras Jan 19 '16 at 21:56
  • Can you please provide some code that elaborates on this? It is unclear what data structure your `set_of_words` is. – rabbit Jan 19 '16 at 22:03
  • Oh @NBartley I am terribly sorry, I thought I had update, please see my updated question. – gsamaras Jan 19 '16 at 22:08

2 Answers2

1

From a brief skim over the class in https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py there isn't an update method, so you would need either to regenerate the wordcloud or add an update method.

Easiest way would probably be to maintain the original source text, and add to the end of this, then regenerate.

Lewis Fogden
  • 515
  • 5
  • 8
1

The easiest solution would be to regenerate the wordcloud with the updated corpus.

To build a corpus with the text contained in your category data structure (for all topics) you could use this comprehension:

# Update the corpus
corpus = " ".join([" ".join(value[1:]) for value in category.values()])
# Regenerate the word cloud
wc.generate(corpus)

To build the word cloud for a single key in your data structure (eg Politics):

# Update the corpus
corpus = " ".join(category["Politics"][1:])
# Regenerate the word cloud
wc.generate(corpus)

Explanation:

  • join glues multiple string together separated by a given delimeter
  • [1:] takes all the elements from a list except the first one
  • dict.values() gives a list of all the values in the dictionary

The expression " ".join([" ".join(value[1:]) for value in category.values()]) thus can be translated as:

First glue together all the elements per key except the first one (as it is a counter). Then glue together all the resulting strings.

rtemperv
  • 675
  • 4
  • 18
  • I want to build a cloud only for a key, not for all keys. I *think* your code doesn't do this? Also a bit of explaining what your first line does would be nice. – gsamaras Jan 19 '16 at 23:05