Python - Randomly subsamble a range of points to plot

Question

I have two lists, x and y, that I wish to plot together in a scatter plot.

The lists contain too many data points. I would like a graph with much less points. I cannot crop or trim these lists, I need to randomly subsamble a set number of points from both of these lists. What would be the best way to approach this?

score 4 · Accepted Answer · edited May 23 '17 at 12:34

You could subsample the lists using

idx = np.random.choice(np.arange(len(x)), num_samples)
plt.scatter(x[idx], y[idx])

However, this leaves the result a bit up to random luck. We can do better by making a heatmap. plt.hexbin makes this particularly easy:

plt.hexbin(x, y)

Here is an example, comparing the two methods:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

np.random.seed(2015)
N = 10**5
val1 = np.random.normal(loc=10, scale=2,size=N)
val2 = np.random.normal(loc=0, scale=1, size=N)

fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
cmap = plt.get_cmap('jet')
norm = mcolors.LogNorm()

num_samples = 10**4
idx = np.random.choice(np.arange(len(val1)), num_samples)
ax[0].scatter(val1[idx], val2[idx])
ax[0].set_title('subsample')

im = ax[1].hexbin(val1, val2, gridsize=50, cmap=cmap, norm=norm)
ax[1].set_title('hexbin heatmap')

plt.tight_layout()
fig.colorbar(im, ax=ax.ravel().tolist())

plt.show()

enter image description here

Thank you. The scatter points are discrete so a heatmap cannot be used in my instance. — Samuel, Jul 09 '15 at 13:47

bakkal · Answer 2 · 2015-07-09T13:06:56.740

You can pick randomly from x and y using a random index mask

import numpy as np
import matplotlib.pyplot as plt


N = 50
x = np.random.rand(N)
y = np.random.rand(N)

# Pick random 10 samples, 2 means two choices from [0, 1] for the mask
subsample = np.random.choice(2, 10).astype(bool)      
plt.scatter(x[subsample], y[subsample])
plt.show()

Alternatively you can use hist2d to plot a 2D histogram, which uses densities instead of data points

plt.hist2d(x, y) # No need to subsample

score 0 · Answer 3 · answered Jul 09 '15 at 12:52

0

You can use random.sample():

max_points = len(x)

# Assuming you only want 50 points.
random_indexes = random.sample(range(max_points), 50)

new_x = [x[i] for i in random_indexes]
new_y = [y[i] for i in random_indexes]

answered Jul 09 '15 at 12:52

user 4

1
1

Python - Randomly subsamble a range of points to plot

3 Answers3