I am trying to split up a large dictionary into n number of smaller dictionaries. Each dictionary entry contains a webaddress, and the point of splitting up the dictionary is so webscraping of these addresses can be spread over several computers.
The dictionary is in the form:
{
u'25637293':
[u'Methyldopa',u'http://www.ncbi.nlm.nih.gov/pubmed/25637293', 43579],
u'25672666':
[u'Furosemide', u'http://www.ncbi.nlm.nih.gov/pubmed/25672666', 40750]
}
with 13000 key/value pairs.
The last entry in the value is an index from 0 to 13000
This is what I have tried. (although I have probably overcomplicated things)
1) Create a list of 13000 values
2) Split this by n amount
3) Ensure that the dictionary has an entry of 1-13000
4) Iterate through the list. if (i in list == the entry of dictionary) then the web address can be extracted for scraping
(last part not included in the code)
smalldict={}
#create a list from 0-13000 and split it into dictionaries of n number
def chunks(l, n):
n = max(1, n)
return [l[i:i + n] for i in range(0, len(l), n)]
#here I am inserting the values for the number of computers and how many dictionaries the big dictionary needs to be divided into
number = len(dictionary)
#entry for the number of dictionaries to divide it into
computers =4
#this is the 'name' of the computer that is running the script
compno = 1
#-1 because of 0 indexing
compm=compno-1
listlength = number/computers
divider= range(number)
division = chunks(divider, listlength)
for entry in dictionary:
#get all of the values from the value
value=dictionary[entry]
#specify the smaller dictionary that will be created
for i in division[compm]:
#if the number up to 13000 is in the dictionary
if i == value[2]
smalldict[value[1]]=value
I would have thought len(smalldict)
would have been 13000/4 (since len(dictionary)
is 13000, and len(division[0])
when there is only one list in the division) but it returns a few hundred only. It is not splitting as it supposed to.
I've been working on it for a lot of days. Can anyone help?