I have for each row id, a list of values as a pandas column. the structure is as follows:
df = {'id1':[['a','b','c','d']],'id2':[['a','d','e','j']],'id3':[['b','d','i','q']]},
df = pd.DataFrame.from_dict(df,orient='index')
which gives me:
At first I've created on the side a set of unique values, using this code:
l = df.values.tolist()
flat_set = {item for sublist in l for item in sublist}
at the end, I need to get a sparse version of this:
Notes:
- no. of unique values in the set - 100K~
- no. of ids - 60K~
I don't mind keeping a dict on the side if shortening the names of the columns leads to reduced memory, but the unpacking from list, to sparse is the hard part, for me.
Please help :)