2

I have data-frame which has colum named label. The values present in the column is :

label
[1,2]
[0,2,1]

I want to create a vector of dimension 240 having value 1 at positions present in label lits.

label_output
[0,1,1,0.......0]
[1,1,1,0,0,0....0]

I am trying to use pandarallel, as I have 60 million data points

Code

%load_ext autoreload
%autoreload 2
import pandas as pd
import time
from pandarallel import pandarallel
import math
import numpy as np

pandarallel.initialize(use_memory_fs=False,nb_workers=10,progress_bar=True)

%%time
import ast
def cluster_vec(lists):
    
    b=[0]*240
    lists=ast.literal_eval(lists)
   
    for num in lists:
        b[int(num)]=1
        
    return b
    
data['clus_vec']=data.label.parallel_apply(lambda lists: cluster_vec(lists))

It get stuck at some point. Here is the screenshot of the processing.

enter image description here

It doesn't process from here.

MAC
  • 1,345
  • 2
  • 30
  • 60

0 Answers0