I am attempting to read two .dat files and create a program that uses the value of aid2name as a key in a dictionary that has the key and values of aid2numplays, set as its values. This is all done in hopes that the file will produce a result that includes (artist name, artist id, frequency of plays). Worth noting that the first file provides artist name and artist id, while the second file provides user id, artist id, and frequency per user. Any ideas how to aggregate those frequencies by user and then display them in the (artist name, artist id, frequency of plays) format? Below is what I have managed so far:
import codecs
aid2name = {}
d2 = {}
fp = codecs.open("artists.dat", encoding = "utf-8")
fp.readline() #skip first line of headers
for line in fp:
line = line.strip()
fields = line.split('\t')
aid = int(fields[0])
name = fields[1]
aid2name = {int(aid), name}
d2.setdefault(fields[1], {})
#print (aid2name)
# do other processing
#print(dictionary)
aid2numplays = {}
fp = codecs.open("user_artists.dat", encoding = "utf-8")
fp.readline() #skip first line of headers
for line in fp:
line = line.strip()
fields = line.split('\t')
uid = int(fields[0])
aid = int(fields[1])
weight = int(fields[2])
aid2numplays = [int(aid), int(weight)]
#print(aid2numplays)
#print(uid, aid, weight)
for (d2.fields[1], value) in d2:
group = d2.setdefault(d2.fields[1], {}) # key might exist already
group.append(aid2numplays)
print(group)