I'm following along with the examples in Wes McKinney's "Python for Data Analysis".
In Chapter 2, we are asked to count the number of times each time zone appears in the 'tz' position, where some entries do not have a 'tz'.
McKinney's count of "America/New_York" comes out to 1251 (there are 2 in the first 10/3440 lines, as you can see below), whereas mine comes out to 1. Trying to figure out why it shows '1'?
I am using Python 2.7, installed at McKinney's instruction in the text from Enthought (epd-7.3-1-win-x86_64.msi). Data comes from https://github.com/Canuckish/pydata-book/tree/master/ch02. In case you can't tell from the title of the book I am new to Python, so please provide instructions on how to get any info I have not provided.
import json
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()
records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']
The last line here will show 'America/New_York', the analog for records[1] shows 'America/Denver'
#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]
This shows the first ten time zone entries, where 8-10 are blank...
#counting using a dict to store counts
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
counts = get_counts(time_zones)
counts['America/New_York']
this = 1, but should be 1251
len(time_zones)
this = 3440, as it should