I have some source data that isn't regularized (sample shown on csv variable on code below). In this data I can't garantee any minimum, maximum or step values. Therefore I need to find out on source data.
After reading the data, and defined the necessary values to plot my image I came with the loop below. Running this code reading (150k lines) like that showed that the code is pretty slow, took me around a 110 seconds (!!!) to render the whole image (a very small image).
Any hints are welcome, even if I have to use other libraries or data types. My main objective is to show up "heat maps" from csv sources like those that can span for a million lines. Reading the file into the dataset o plotting the graph is fast. The issue is create the image map from the csv.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
csv = """
"X","Y","V"
1001,1001,909.630432
1001,1003,940.660156
1001,1005,890.571594
1001,1007,999.651062
1001,1009,937.775513
1003,1002,937.601074
1003,1004,950.006897
1003,1006,963.458923
1003,1008,878.646851
1003,1012,956.835938
1005,1001,882.472656
1005,1003,857.491028
1005,1005,907.293335
1005,1007,877.087891
1005,1009,852.005554
1007,1002,880.791931
1007,1004,862.990967
1007,1006,882.135864
1007,1008,896.634521
1007,1010,888.916626
1013,1001,853.410583
1013,1003,863.324341
1013,1005,843.284607
1013,1007,852.712097
1013,1009,882.543640
"""
data=io.StringIO(csv)
columns = [ "X" , "Y", "V" ]
df = pd.read_csv(data, sep=',', skip_blank_lines=True, quoting=2, skipinitialspace=True, usecols = columns, index_col=[0,1] )
# Fields
x_axis="X"
y_axis="Y"
val="V"
# Unique values on the X-Y axis
x_ind=df.index.get_level_values(x_axis).unique()
y_ind=df.index.get_level_values(y_axis).unique()
# Size of each axis
nx = len(x_ind)
ny = len(y_ind)
# Maxima and minima
xmin = x_ind.min()
xmax = x_ind.max()
ymin = y_ind.min()
ymax = y_ind.max()
img = np.zeros((nx,ny))
print "Entering in loop"
for ix in range(0, nx):
print "Mapping {0} {1}".format( x_axis, ix )
for iy in range(0, ny):
try:
img[ix,iy] = df.loc[ix+xmin,iy+ymin][val]
except KeyError:
img[ix,iy] = np.NaN
plt.imshow(img, extent=[xmin, xmax, ymin, ymax], cmap=plt.cm.jet, interpolation=None)
plt.colorbar()
plt.show()
Tried to use pcolormesh, but was not able to correctly fit the values into the mesh without use a similar loop. I was not able to create the z_mesh without the loop
x_mesh,y_mesh = np.mgrid[xmin:xmax,ymin:ymax]
z_mesh = ?? hints ?? ;-)