I have code like:
import pandas as pd
import multiprocessing as mp
a = {'a' : [1,2,3,1,2,3], 'b' : [5,6,7,4,6,5], 'c' : ['dog', 'cat', 'tree','slow','fast','hurry']}
df = pd.DataFrame(a)
def performDBSCAN(feature):
value=scorecalculate(feature)
print(value)
for ele in range(4):
value=value+1
print('here value is ', value)
return value
def processing(feature):
result1=performDBSCAN(feature)
return result1
def scorecalculate(feature):
scorecal=0
for val in ['a','b','c','d']:
print('alpha is:', val )
scorecal=scorecal+1
return scorecal
columns = df.columns
for ele in df.columns:
processing(ele)
The above code is executing in a serial fashion. I would like to make faster by processing each col in the parallel fashion by using python and I wrote the following code using multiprocessing but didn't help.
import pandas as pd
import multiprocessing as mp
def performDBSCAN(feature):
value=scorecalculate(feature)
print(value)
for ele in range(4):
value=value+1
print('here value is ', value)
return value
def scorecalculate(feature):
scorecal=0
for val in ['a','b','c','d']:
print('alpha is:', val )
scorecal=scorecal+1
return scorecal
def processing(feature):
result1=performDBSCAN(feature)
return result1
a = {'a' : [1,2,3,1,2,3], 'b' : [5,6,7,4,6,5],
'c' : ['dog','cat','tree','slow','fast','hurry']}
df = pd.DataFrame(a)
columns = df.columns
pool = mp.Pool(4)
resultpool = pool.map(processing, columns)
I couldn't see any output and the kernel is continuously running without any output? what could be the issue? Is there any other way of doing it by other libraries in numba? (Note: this code is an normal example. The basic idea is that i have to take each column in a dataframe and perform DBSCAN algorithm. Based on the result of DBSCAN, i have another function to calculate score for that. I gave these two funtions in the above code. Incrementing operations in the above functions are used for verifying to to see whether it goes to funtion or not. That was my intention. Here in first part of code it is taking in a serial fashion whereas i need to parallelise this area of for loop so that i can process multiple columns in parallel fashion).