0

I have a file that has a list of bands and the album and year it was produced. I need to write a function that will go through this file and find the different names of the bands and count how many times each of those bands appear in this file.

The way the file looks is like this:

Beatles - Revolver (1966)
Nirvana - Nevermind (1991)
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967)
U2 - The Joshua Tree (1987)
Beatles - The Beatles (1968)
Beatles - Abbey Road (1969)
Guns N' Roses - Appetite For Destruction (1987)
Radiohead - Ok Computer (1997)
Led Zeppelin - Led Zeppelin 4 (1971)
U2 - Achtung Baby (1991)
Pink Floyd - Dark Side Of The Moon (1973)
Michael Jackson -Thriller (1982)
Rolling Stones - Exile On Main Street (1972)
Clash - London Calling (1979)
U2 - All That You Can't Leave Behind (2000)
Weezer - Pinkerton (1996)
Radiohead - The Bends (1995)
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995)
.
.
.

The output has to be in descending order of frequency and look like this:

band1: number1
band2: number2
band3: number3

Here is the code I have so far:

def read_albums(filename) :

    file = open("albums.txt", "r")
    bands = {}
    for line in file :
        words = line.split()
        for word in words:
            if word in '-' :
                del(words[words.index(word):])
        string1 = ""
        for i in words :
            list1 = []

            string1 = string1 + i + " "
            list1.append(string1)
        for k in list1 :
            if (k in bands) :
                bands[k] = bands[k] +1
            else :
                bands[k] = 1


    for word in bands :
        frequency = bands[word]
        print(word + ":", len(bands))

I think there's an easier way to do this, but I'm not sure. Also, I'm not sure how to sort a dictionary by frequency, do I need to convert it to a list?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Take a look at [`collections.Counter`](http://docs.python.org/2/library/collections.html#collections.Counter) – Lukas Graf Aug 07 '13 at 16:36

3 Answers3

2

You are right, there is an easier way, with Counter:

from collections import Counter

with open('bandfile.txt') as f:
   counts = Counter(line.split('-')[0].strip() for line in f if line)

for band, count in counts.most_common():
    print("{0}:{1}".format(band, count))

what exactly is this doing: line.split('-')[0].strip() for line in f if line?

This line is a long form of the following loop:

temp_list = []
for line in f:
    if line: # this makes sure to skip blank lines
      bits = line.split('-')
      temp_list.add(bits[0].strip())

counts = Counter(temp_list)

Unlike the loop above however - it doesn't create an intermediary list. Instead, it creates a generator expression - a more memory efficient way to step through things; which is used as an argument to Counter.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • Note that `Counter` is only available on 2.7 and later. If you're using something earlier than that, check out the accepted answer here: http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value – Peter DeGlopper Aug 07 '13 at 16:42
  • I'm still pretty new to python, so what does the with statement do? Not in this code, but in general. – Preston May Aug 07 '13 at 16:51
  • Preston, check out [this writeup](http://effbot.org/zone/python-with-statement.htm) on effbot. – Burhan Khalid Aug 07 '13 at 16:54
  • `with` binds `open('bindfile.txt')` to `f`, so that in the lines that follow, you can refer to the former using the latter. – Charles Marsh Aug 07 '13 at 16:56
  • 1
    @CharlesMarsh although not incorrect, there is a bit more to the with statement. For one it automatically closes the file when the execution moves out of the context of the statement; and it handles exceptions as well. The [documentation](http://docs.python.org/2/reference/compound_stmts.html#with) and the [writeup on effbot](http://effbot.org/zone/python-with-statement.htm) go into more details. – Burhan Khalid Aug 07 '13 at 16:59
  • Thanks guys. Also, what exactly is this doing: line.split('-')[0].strip() for line in f if line? I understand the line.split part but what is the [0] for and the for statement within the line? – Preston May Aug 07 '13 at 17:01
  • `line.split('-')` returns a list of strings that were separated by hyphens. `[0]` gets the first string in that list. The `for` is part of a generator expression. Search for generator expressions and list comprehensions in Python – Chris Barker Aug 07 '13 at 17:05
1

If you're looking for conciseness, use a "defaultdict" and "sorted"

from collections import defaultdict
bands = defaultdict(int)
with open('tmp.txt') as f:
   for line in f.xreadlines():
       band = line.split(' - ')[0]
       bands[band] += 1
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True):
    print '%s: %d' % (band, count)
thierrybm
  • 129
  • 6
  • Why sorted? The question doesn't ask for sorted output. Note that `collections.Counter().most_common()` would be more concise still, as it returns items in reverse sorted order by frequency for you. – Martijn Pieters Aug 07 '13 at 16:43
  • True; hadn't seen the Counter solution when I wrote mine, it is better! – thierrybm Aug 08 '13 at 13:14
0

My approach is to use the split() method to break the file lines into a list of constituent tokens. Then you can grab the band name (first token in the list), and start adding the names to a dictionary to keep track of the counts:

import operator

def main():
  f = open("albums.txt", "rU")
  band_counts = {}

  #build a dictionary that adds each band as it is listed, then increments the count for re-lists
  for line in f:
    line_items = line.split("-") #break up the line into individual tokens
    band = line_items[0]

  #don't want to add newlines to the band list
  if band == "\n":
    continue

  if band in band_counts:
    band_counts[band] += 1 #band already in the counts, increment the counts
  else:
    band_counts[band] = 1  #if the band was not already in counts, add it with a count of 1

  #create a list of sorted results
  sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1))

  for item in sorted_list:
    print item[0], ":", item[1]

Notes:

  1. I followed the advice of this answer to create the sorted results: Sort a Python dictionary by value
  2. If you are new to Python, check out Google's Python class. I found it very helpful when I was just getting started: https://developers.google.com/edu/python/?csw=1
Community
  • 1
  • 1
caffreyd
  • 1,151
  • 1
  • 17
  • 25