I am trying to fetch data from Last.fm, and write these data to a CSV file. I have a.csv, and based on each line of this a.csv, I fetch additional data from Last.fm, then save them to b.csv. So as a result, a.csv and b.csv are of the same size.
a.csv is a large text file with about 8 million data lines, so I am trying to run multiple processes that each process about 250,000 lines.
I tried with the python multiprocessing module, and I also tried running multiple terminals. The problem is that most of the time (about 9 out of 10 or more), the processes randomly stop writing to each CSV file.
For example, I start running 4 processes, and they will normally start writing to separate CSV files. Then when random time passes, few of the CSV files won't be modified anymore. Sometimes one of the CSV will stop right after (a few minutes or so) I start running the process, and other csvs also stop after a few hours, or a few decimal hours. These patterns are totally random, and very rarely, all the processes will finish successfully, which is why I cannot figure out the reason they keep stopping. I tried on other computers and there is no difference, so the problem doesn't seem computing resource-dependent.
Also, even though the CSV files stop being modified, the process is still running, as I made the code print its progress to the terminal every 1000 data lines.
Following is the overall structure of my code (I just wrote the codes that I thought is indispensable to understand the program, in abstracted form):
f_reader = csv.reader(f, delimeter=',')
# (same for other csv files needed ..)
for line in a.csv:
if 1000 data lines are processed:
print('1000 tracks processed')
url = Lastfm API root url + selective data in line
req = urllib2.Request(url)
response = urllib2.urlopen(req) # fetch data from Last.fm and save result to req
info = etree.fromstring(response.read())
temp1 = info.find('data1').text.encode('utf-8')
temp2 = info.find('data2').text.encode('utf-8')
temp = [temp1, temp2]
for column in temp:
f.write('%s;' % column)
f.write('\n')
f.close()
Can anyone help?