Is there a way to improve this code writen in Python. I use the library Pandas and Python 3.4:
bd_data = pd.DataFrame(list(bd_data))
column = list(bd_data[numeric])
for i in range(0,len(column)):
pos = bisect.bisect_left(intervalsArray,int(column[i]))
bd_data.ix[i,'colorCluster'] = colorsPalette[pos]
I'm trying to assign a color in colorCluster from a colorPalette based on the position of a number in a list of intervals. It is taking about 6 seconds to process 16000 rows, which is way too much. I think I'm not using Pandas the way it is intended, specially here:
bd_data.ix[i,'colorCluster']
I'm actually doing this in R (with rpy2) with this line of code in less than a second:
dataToAnalyse$colorCluster <- colorsPalette[findInterval(dataToAnalyse$numeric, intervals)+1]
I'm sure there is a way to increase performance in Python, as many people say processing is faster in this language more often (not always) than in R. Also, please advice a better title for the question as I'm not fluent with Pandas terminology.