I have data-frame which has colum named label
. The values present in the column is :
label
[1,2]
[0,2,1]
I want to create a vector of dimension 240
having value 1
at positions present in label lits.
label_output
[0,1,1,0.......0]
[1,1,1,0,0,0....0]
I am trying to use pandarallel
, as I have 60 million data points
Code
%load_ext autoreload
%autoreload 2
import pandas as pd
import time
from pandarallel import pandarallel
import math
import numpy as np
pandarallel.initialize(use_memory_fs=False,nb_workers=10,progress_bar=True)
%%time
import ast
def cluster_vec(lists):
b=[0]*240
lists=ast.literal_eval(lists)
for num in lists:
b[int(num)]=1
return b
data['clus_vec']=data.label.parallel_apply(lambda lists: cluster_vec(lists))
It get stuck at some point. Here is the screenshot of the processing.
It doesn't process from here.