I have a DataFrame
as following:
col1 col2 col3 col4 col5 col5 col6
0.6 '0' 'first' 0.93 'lion' 0.34 0.98
0.7 '1' 'second' 0.47 'cat' 0.43 0.76
0.4 '0' 'third' 0.87 'tiger' 0.24 0.10
0.6 '0' 'first' 0.93 'lion' 0.34 0.98
0.5 '1' 'first' 0.32 'tiger' 0.09 0.99
0.4 '0' 'third' 0.78 'tiger' 0.18 0.17
0.5 '1' 'second' 0.98 'cat' 0.47 0.78
I need to take each column(say col1
,col2
, col3
, and so on..) from the above DataFrame
in a for loop to a function as below:
list=[]
for col in df.columns:
result = performDBSCAN(df[col])
list.append([col,score])
def performDBSCAN(feature):
......(some implementation)
score = scorecalculate(col)
......(some implementation
return somevalue
def scorecalculate(feature):
.......(some implementation)
return scorecal
Basically, I wanted to run the above code for many columns which is taking more time to complete the processing time. I wanted to know how can I make it faster or to run parallel in python since I have 404 columns and 5000 rows. Also, I need some suggestions on whether I can do it in Tensorflow
or Spark
? (I ask this question since I had no idea on Spark
and Tensorflow
but seeking for a suggestion)