1

I'm creating a single scatter plot with about 500 files, each a few hundred megabytes. I want all the points from each file to be a certain color corresponding to a single value (a float) in the file's metadata.

I can't find a way of first setting the color range for the whole plot, and then setting the color of a given plt.scatter instance to a value in that range. It seems that no matter what, matplotlib wants to choose a color for each point from an iterable of the same size as the data. This is not practical for my application, as creating a single array for all of my data would be several gigabytes.

A pseudo-codey thing that I'd like to do is along the lines of:

for file in files:
    val = get_metadata(file)
    data = np.genfromtxt(file)
    color_range = [c_min, c_max]
    plt.scatter(data[:,0],
                data[:,1],
                color_range = color_range,
                c = val)
plt.show()

Does anyone know of a matplotlib way to do this? I really haven't been able to find it in the documentation.

It was suggested that this is a duplicate of this question, but it's slightly different. The solution offered there,

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm

x = np.arange(10)
ys = [i+x+(i*x)**2 for i in range(10)]

colors = cm.rainbow(np.linspace(0, 1, len(ys)))
for y, c in zip(ys, colors):
    plt.scatter(x, y, color=c)

iterates through the defined colors sequentially. I want to define my color map as above, but then select from within that color map with the metadata value, which is a continuous float within a certain range.

David
  • 424
  • 3
  • 16

0 Answers0