5

I have a large 2D dataset where I want to associate to each X,Y pair a color and plot it with matplotlib. I am talking about 1000000 points. I wonder what is the best approach in terms of performance (speed) and if you could point to some example

Open the way
  • 26,225
  • 51
  • 142
  • 196
  • Are the points on a regular grid? Do you want them on a regular grid? You need to give more information! 1e6 isn't that many points, though. You should be okay with a scatterplot, if that's what you're wanting. 1e6 pixels for an image isn't much at all, so if the points are on a regular grid, you'd have no problem there, either... – Joe Kington Jun 11 '11 at 19:59
  • 1
    also see http://stackoverflow.com/questions/4082298/scatter-plot-with-a-huge-amount-of-data – Jeff Jun 11 '11 at 20:04
  • Have you looked at `imshow`, then? http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.imshow How exactly do you want to specify the color for the X,Y pair? Do you want a unique color for every 1e6 positions? (It won't be possible to visually distinguish 1e6 colors...) Do you want want to map the value to a colorbar? (This is what `imshow` does by default) – Joe Kington Jun 11 '11 at 20:09
  • I would like to have kind of a bmp figure where each x,y pair has a color depending on the value, I will have a look at your answer – Open the way Jun 11 '11 at 20:14

1 Answers1

9

If you're dealing with a regular grid, just treat it as an image:

import numpy as np
import matplotlib.pyplot as plt

nrows, ncols = 1000, 1000
z = 500 * np.random.random(nrows * ncols).reshape((nrows, ncols))

plt.imshow(z, interpolation='nearest')
plt.colorbar()
plt.show()

enter image description here

If you have randomly ordered x,y,z triplets that make up a regular grid, then you'll need to grid them.

Essentially, you might have something like this:

import numpy as np 
import matplotlib.pyplot as plt

# Generate some data
nrows, ncols = 1000, 1000
xmin, xmax = -32.4, 42.0
ymin, ymax = 78.9, 101.3

dx = (xmax - xmin) / (ncols - 1)
dy = (ymax - ymin) / (ncols - 1)

x = np.linspace(xmin, xmax, ncols)
y = np.linspace(ymin, ymax, nrows)
x, y = np.meshgrid(x, y)

z = np.hypot(x - x.mean(), y - y.mean())
x, y, z = [item.flatten() for item in (x,y,z)]

# Scramble the order of the points so that we can't just simply reshape z
indicies = np.arange(x.size)
np.random.shuffle(indicies)
x, y, z = [item[indicies] for item in (x, y, z)]

# Up until now we've just been generating data...
# Now, x, y, and z probably represent something like you have.

# We need to make a regular grid out of our shuffled x, y, z indicies.
# To do this, we have to know the cellsize (dx & dy) that the grid is on and
# the number of rows and columns in the grid. 

# First we convert our x and y positions to indicies...
idx = np.round((x - x.min()) / dx).astype(np.int)
idy = np.round((y - y.min()) / dy).astype(np.int)

# Then we make an empty 2D grid...
grid = np.zeros((nrows, ncols), dtype=np.float)

# Then we fill the grid with our values:
grid[idy, idx] = z

# And now we plot it:
plt.imshow(grid, interpolation='nearest', 
        extent=(x.min(), x.max(), y.max(), y.min()))
plt.colorbar()
plt.show()

enter image description here

Joe Kington
  • 275,208
  • 71
  • 604
  • 463