0

I am trying to create a word cloud in python using pytagcloud. With my current cloud, I can generate a cloud, but the words all are the same size. How can I alter the code so that my words' sizes appear in relation to their frequency?

My text file already has the words with their respective frequency counts already in it, the format is like "George, 44" newline "Harold, 77", newline, "Andrew, 22", newline, etc. However, when it displays the word, it also displays the frequency with it.

with open ("MyText.txt", "r") as file:
   Data =file.read().replace('\n', '')

tags = make_tags(get_tag_counts(Data), maxsize=150)

create_tag_image(tags, 'Sample.png', size=(1200, 1200),background=(0, 0, 0, 255),  fontname='Lobstero', rectangular=True)

import webbrowser
webbrowser.open('Sample.png')
Hooked
  • 84,485
  • 43
  • 192
  • 261
Taylor
  • 9
  • 1
  • 3
  • 2
    Where are `make_tag`, `get_tag_counts`, and `create_tag_image` defined? – Kevin Mar 25 '15 at 19:10
  • Try opening MyText.txt as a csv file, so you get rows in which the name and frequency are associated but distinct. – cphlewis Mar 25 '15 at 19:11
  • @Taylor Welcome to Stack Overflow! You _must_ include your import statements or mention the library you are using. It was _not_ obvious that you were using this library -- without this information we can't help you! Be sure to upvote and accept good answers and ask for help on the meta if you need it. – Hooked Mar 25 '15 at 19:27
  • Can't save it as .csv, I am on a windows machine. Also, I am using the pytagcloud library – Taylor Mar 25 '15 at 19:30
  • FYI for new visitors, now there is a `wordcloud` package I would use instead – wordsforthewise Nov 06 '17 at 03:09

1 Answers1

2

You need to cast the result into a tuple. Using your question as input text we get the expected result:

from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts

TEXT = '''I am trying to create a word cloud in python. With my current cloud, I can generate a cloud, but the words all are the same size. How can I alter the code so that my words' sizes appear in relation to their frequency?'''

counts = get_tag_counts(TEXT)
tags = make_tags(counts, maxsize=120)
create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')

enter image description here

It is worth looking at the variable counts:

[('cloud', 3), 
('words', 2), 
('code', 1), 
('word', 1), 
('appear', 1), ...

which is simply a list of tuples. Since your input text file contains a list of tuples, you simply need to pass that information into make_tags.

Edit: You can read a file like this

counts = []
with open("tag_file.txt") as FIN:
   for line in FIN:
       # Assume lines look like: word, number
       word,n = line.strip().split()
       word = word.replace(',','')
       counts.append([word,int(n)])
Hooked
  • 84,485
  • 43
  • 192
  • 261
  • sorry, this doesn't work. The words are in a file, separated. If I just enter them as a string, this ignores their respective frequency. When I tried your code, it didn't generate a word cloud. Just for the sake of clarity: each word in my list is listed only once, with its frequency from a corpus directly beside it and separated by a comma – Taylor Mar 25 '15 at 19:33
  • @Taylor I think your problem is reading in the file. If you post some sample code that mimics your frequency table we can go from there. I'll add a small example that would show how to read a file though. – Hooked Mar 25 '15 at 19:43
  • The items in my text file involve a string and an integer. The integer needs to somehow affect the size of the words in the word cloud, as each word has a different predetermined frequency. If I turn them into a string as your demonstrated, each word would occur only once and therefore not affect the sizing. That is not what I need – Taylor Mar 25 '15 at 19:44
  • The integer _will_ change the sizing if you pass in the tuple `counts`. You might not be casting it to an int so the module thinks the number is a string. – Hooked Mar 25 '15 at 19:50
  • That's more helpful, but when I tried you edited code above, it then told me that there were too many values to unpack. I have 98 lines --- anyway to remedy that? – Taylor Mar 25 '15 at 20:00
  • The code that I used to read in the file is listed above. Is it easier if we talk about this via private chat? – Taylor Mar 25 '15 at 20:02
  • @Taylor sorry, chat is blocked at my current location. "Too many values to unpack" means that your text file doesn't follow the exact format of "word count", maybe you have a header line that says "# my info" or a line with too many fields, "word word count". Post a few **raw** lines of the text file into your question. – Hooked Mar 25 '15 at 20:09
  • Bill, 310 (newline), Andrew, 4022 (newline) Frank, 818(newline) George, 1345 – Taylor Mar 25 '15 at 20:14
  • I updated the code to try and guess what the problem is. I stripped the line, maybe you have extra characters you are not aware of at the ends, and I removed the commas. If this doesn't work, upload the text file somewhere and I can look at it. – Hooked Mar 25 '15 at 20:40
  • Hooked, I can't upload the file anywhere. But for reference, it's an excel file with the names in column one and their frequency count in column two. Please help, I need this project done soon. Also, even when I reduced the number of items in the excel sheet, it still said that i couldnt handle the data – Taylor Apr 01 '15 at 13:03
  • Hey hooked could you please help me? – Taylor Apr 01 '15 at 15:28