I am working on a Jupyter Notebbok, plotting markers at certain latitude and longitudes in a map (boundary) using geopandas but I have ~40,000 locations and I need to mark (and color) each of them with a color based on a condition.
Screenshot of Geopandas dataframe gdf
is below:
Code Snippet:
import matplotlib.patches as mpatches
# we have range of values from 0-15000
threshold1 = [8000,'#e60000']
threshold2 = [500,'#de791e']
threshold3 = [200,'#ff00ff']
threshold4 = [0 ,'#00ff0033']
# Create a dictionary of colors based on threshold
color_dict = {}
for x in gdf.n.to_list():
if x>= threshold1[0] : color_dict[x] = threshold1[1]
if x>= threshold2[0] and x<threshold1[0]: color_dict[x] = threshold2[1]
if x>= threshold3[0] and x<threshold2[0]: color_dict[x] = threshold3[1]
if x<threshold3[0] : color_dict[x] = threshold4[1]
# Set labels for the legend
a_patch = mpatches.Patch(color = threshold1[1],
label= str(threshold1[0]) + '-' + str(max(gdf.n.to_list())))
b_patch = mpatches.Patch(color = threshold2[1],
label= str(threshold2[0]) + '-' + str(threshold1[0]))
c_patch = mpatches.Patch(color = threshold3[1],
label= str(threshold3[0]) + '-' + str(threshold2[0]))
d_patch = mpatches.Patch(color = threshold4[1],
label= str(min(gdf.n.to_list())) + '-' + str(threshold3[0]))
ax = gdf.plot(markersize=0 ,figsize = (20,20))
usa.geometry.boundary.plot(color=None,edgecolor='k',linewidth = 0.5, ax = ax)
# There are ~40,000 values to be iterated here
for x, y, label in zip(tqdm(gdf.geometry.x), gdf.geometry.y, gdf.n):
ax.annotate('X', weight = 'bold', xy=(x, y), xytext=(x, y), fontsize= 8,color = color_dict[label], ha='center')
sleep(0.1)
ax.annotate(label, xy=(x, y), xytext=(x, y), fontsize= 8, color = color_dict[label], ha='center')
usa.apply(lambda x: ax.annotate(text = x.NAME, xy=x.geometry.centroid.coords[0], ha='center', fontsize= 2,color='black'),axis=1);
plt.xlim([-130,-60])
plt.ylim([20,55])
plt.legend(handles=[a_patch, b_patch, c_patch, d_patch])
plt.savefig("state.png",pad_inches=0, transparent=False, format = 'png')
I know it is this line which takes the most time:
for x, y, label in zip(tqdm(gdf.geometry.x), gdf.geometry.y, gdf.n):
ax.annotate('X', weight = 'bold', xy=(x, y), xytext=(x, y), fontsize= 8,color = color_dict[label], ha='center')
sleep(0.1)
but I can not figure out any other way to label each coordinate without looping. Please help me in making it faster. 2-3 hours is very unreasonable for my work! Thank you!
Some references that I had usedfor my code: