I want to generate word clouds of different part of speech of text. The problem I face is that I don't want to tune the parameters myself for each of them. Moreover, since I am running this on a number of docs, it is very tedious to do so for each. Is there an automated approach?
max_w=max(len(verbs),len(adjectives),len(adverbs),len(nouns))
word_cloud_generator("verbs",verbs,description,len(verbs)/max_w)
word_cloud_generator("adjectives",adjectives,description,len(adjectives)/max_w)
word_cloud_generator("adverbs",adverbs,description,len(adverbs)/max_w)
word_cloud_generator("nouns",nouns,description,len(nouns)/max_w)
def word_cloud_generator(part,data,description,scale):
TEXT= " ".join(data)
xlim=600*(scale)
ylim=600*(scale)
max_size= 200*(scale)
min_size=2*(scale)
threshold=4*(scale)
tags = make_tags(get_tag_counts(TEXT), maxsize=max_size,minsize=min_size)
tags=[a for a in tags if a['size'] > threshold]
filename=description+ "_"+part+'.png'
create_tag_image(tags,filename , size=(xlim, ylim), fontname='Molengo', \
background=(0,0,0),rectangular=True)
Please help.
Edit: The above code tries to select word cloud parameters depending upon the number of words in the text. But the results I get aren't efficent enough. By efficiency I mean an uncluttered and non overlapping word cloud with reasonable image size.