I have a function to find common, uncommon items and its rates between a given list (one list) and other
lists (60,000 lists) for each user (4,000 users). Running below loop takes too long time and high momery usage
with partial list construction and crash. I think due to the long returned list and heavy elements (tuples),
so I divided it into two functions as below , but it seems the problem in appending list items in the tuple,
[(user, [items],rate),(user, [items],rate),....]
. I want to create a dataframes from returned values,
What should I do to an algorithm to get around this matter and reduce memory usage?
Iam using python 3.7, windows 10, 64-bit , RAM 8G.
common items function:
def common_items(user,list1, list2):
com_items = list(set(list1).intersection(set(list2)))
com_items_rate = len(com_items)/len(set(list1).union(set(list2)))
return user, com_items, com_items_rate
uncommon items function:
def uncommon_items(user,list1, list2):
com_items = list(set(list1).intersection(set(list2)))
com_items_rate = len(com_items)/len(set(list1).union(set(list2)))
uncom_items = list(set(list2) - set(com_items)) # uncommon items that blonge to list2
uncom_items_rate = len(uncom_items)/len(set(list1).union(set(list2)))
return user, com_items_rate, uncom_items, uncom_items_rate # common_items_rate is also needed
Constructing the list:
common_item_rate_tuple_list = []
for usr in users: # users.shape = 4,000
list1 = get_user_list(usr) # a function to get list1, it takes 0:00:00.015632 or less for a user
# print(usr, len(list1))
for list2 in df["list2"]: # df.shape = 60,000
common_item_rate_tuple = common_items(usr,list1, list2)
common_item_rate_tuple_list.append(common_item_rate_tuple)
print(len(common_item_rate_tuple_list)) # 4,000 * 60,000 = 240,000,000 items
# sample of common_item_rate_tuple_list:
#[(1,[2,5,8], 0.676), (1,[7,4], 0.788), ....(4000,[1,5,7,9],0.318), (4000,[8,9,6],0.521)
I looked at (Memory errors and list limits?) and (Memory error when appending to list in Python) they deal with constructed list. And I couldnot deal with suggested answer for (Python list memory error).