0

My question is similar to my previous question: Python list help (incrementing count, appending). My accepted answer works well. However, this time I have a different question.

I'm parsing a string from a json file, do some clean up then append it a new string. I need to get a counter of each word (which makes it a unique list, the counter for occurrence gets updated), sort it by high to low (I believe I need to use most_common here) then limit the list to 20. I can do all of this in JavaScript but not in python.

In detail, I'm again running through a for loop to get each string from strings(json strings file) like this.

# Counter for each word.
words = Counter();

for e in strings:
    # I am cleaning up the string here for unwanted chars, make it lower case
    # and append it to a new string variable.
    # if I were to print the new string variable it will look like this: 
    # hello test another test append hi hai hello hello

# i know I need to call words.update
# should I run a for loop in my new string variable  for each word?

Also how I can limit it to 20?

What I would like to generate is something like this:

word, count
hello 3
test 2
another 1
append 1
hai 1
hi 1

Any suggestions would be great thanks.

sophros
  • 14,672
  • 11
  • 46
  • 75
chatu
  • 305
  • 5
  • 13

1 Answers1

3

If you have a list of words, you'd use the .update() method:

words.update(some_list_of_words)

You can pass in a generator expression too:

words.update(word.lower() for word in e.split())

would split the string e into separate words on whitespace, then lowercase each word and count these.

.most_common() takes a parameter, the maximum number of items to return:

words.most_common(20)

Demo with a smaller set of words, limiting it to the top 3 most common words:

>>> from collections import Counter
>>> words = Counter('spam ham eggs baz foo bar baz spam ham eggs spam spam bacon eggs ham spam spam spam eggs ham'.split())
>>> words.most_common(3)
[('spam', 7), ('ham', 4), ('eggs', 4)]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343