0

I've been trying to do this for a while now, but for some reason I can't do it.

I have a large data set, which includes a timeseries of parameter1 and parameter2. I am plotting parameter1 as a function of time, but would like the color of the plot to be based on the probability distribution of parameter2.

For example:

import numpy as np
import matplotlib.pyplot as plt

time = np.arange(0., 5., 0.02)
parameter1 = np.sin(2*np.pi*time)
###parameter2 = np.random.randn(10000)
parameter2 = np.cos(2*np.pi*time)

# Plot parameter1 as a function of time
plt.plot(time, parameter1)

# Plot parameter2 distribution
plt.hist(parameter2)

In my data parameter1 and parameter2 both vary as a function of time. Therefore, I want to color parameter1 using a colormap based on the distribution of paramter2. How would I do this? Is there a straightforward way of accomplishing this, that I'm not aware of? The main problem I have is that the parameters change differently with time, and so it's not clear to me how to do this.

EDIT: Both parameters share the same time data points. I.e. For each point in time, I have a value of parameter1 and parameter2. Therefore, although parameter1 is not related to parameter2 (even though, in the above example they are), they both share the same time axis.

besi
  • 171
  • 1
  • 9
  • I think this is a question of logic, which is unclear: parameter2 seems to be completely unrelated to time as well as to parameter1; so what exactly should be the criterion by which to colorize the points? You may give a minimal example, taking 10 points or so to illustrate what you want or you may tell us which function would define the color. Otherwise I don't think one can help you here. – ImportanceOfBeingErnest May 20 '17 at 10:33
  • 1
    Something like this? http://stackoverflow.com/questions/7881994/matplotlib-how-to-change-data-points-color-based-on-some-variable – user1620443 May 20 '17 at 10:38
  • @user1620443 I have seen that, but the problem there is simpler, or maybe I don't know how to extend that solution to my problem? And, also, I would like to avoid using a scatter plot. – besi May 20 '17 at 10:52
  • 1
    Can we simplify the problem? Say we have `time = [0,1,2,3,4,5]; p1 = [0,1,0,-1,0,1]; p2 = [1,2,3,3,3,1]`, can you say which color each of the points of `plot(time, p1)` should have in dependence of `p2` and why? – ImportanceOfBeingErnest May 20 '17 at 11:01
  • The scatter plot in the link uses time on the x axis (so it's pretty much what you do using "plot"), and the color is related to parameter 2. If that's not what you are trying to achieve, I am afraid the question is still not clear enough. – user1620443 May 20 '17 at 11:01
  • 1
    @ImportanceOfBeingErnest Sure! Here is exactly what I want to do: plt.hist(p2, bins=3, normed=True). Each bin should have a different color (or better, a different shading of the same color). Now, to color a point in plot(time, p1), check in which bin the corresponding point in p2 falls in, and color the point in p1 using that bins color. – besi May 20 '17 at 11:19

1 Answers1

1

You would have to calculate the value of the histogram for each point in the time array, e.g. like this:

import numpy as np; np.random.seed(12)
import matplotlib.pyplot as plt

def plot(t, p1, p2, **kwargs):
    # Plot parameter1 as a function of time
    plt.plot(t, p1, zorder=0, color="gray")

    hist, edges = np.histogram(p2)

    def lookup(x):
        if x == edges[-1]:
            x = edges[-1]- .1*(edges[-1] - edges[-2])
        return int((len(edges)-1.)/(edges[-1]-edges[0])*(x-edges[0]))

    c = [hist[lookup(p)] for p in p2]
    sc = plt.scatter(t, p1, c=c, **kwargs)
    plt.colorbar(sc)
    plt.show()


time = [0,1,2,3,4,5]
parameter1 = [0,1,0,-1,0,1]
parameter2 = [1,2,3,3,3,1]

plot(time,parameter1,parameter2, s=100)

time = np.arange(0., 5., 0.02)
parameter1 = np.sin(2*np.pi*time)
parameter2 = np.random.randn(250)

plot(time,parameter1,parameter2)

enter image description here enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thank you! This will likely solve my problem. I appreciate your help. – besi May 20 '17 at 13:24
  • 1
    Once you made sure that it works, you may [accept](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) the answer. If there is some problem occuring, you may of course refine your question. I would anyways recommend to update your question to include the details from the comments; a question should be complete by itself (without the need to read the comments). – ImportanceOfBeingErnest May 20 '17 at 13:30
  • @ImportanceOfBeingErnest so did I understand this right, the colouring of the data points reflects the amount of points in the respective history bin? Would that be clearer with a `colorbar`? – Thomas Kühn May 20 '17 at 18:16
  • @Thomas That's at least the idea. I just saw that it may be off by one bin, I will have a closer look at it later. – ImportanceOfBeingErnest May 20 '17 at 18:33
  • @ImportanceOfBeingErnest yes, at least I would expect that in the first plot points with the same y-value would have the same colour. Maybe only subtract `1` in `len(edges)-2.`? – Thomas Kühn May 20 '17 at 18:39
  • The plot itself is actually correct. Color is according to the number of occurences in parameter2 not parameter1: You have 3 times the value 3, all of those have the same lightest color; you have the 1 only once and it has the darkes color. So far that is all good, but the numeric values in the histogram are off by one bin. Subtracting 1 would not work, but indeed something in that line needs to change. Will look at that after dinner. – ImportanceOfBeingErnest May 20 '17 at 18:46
  • @ImportanceOfBeingErnest I tried it, and I think it works (have to test it more though, and for my actual problem I am dealing with a much more complex data structure). But I think there is a bug with your lookup function. For x=2, it gives c=0, but it should give c=1. I'm assuming this is due to 4.5 being rounded to 4, and not 5, but I'm not sure. Your approach is however correct, so I'll use that as a foundation to solve my problem. Thank you! – besi May 20 '17 at 18:48
  • So I corrected the error. The problem was that the last bin is a closed bin, so it includes the maximum value. This is now taken care of with a function that is a little more complicated. Colorbars are included, such that the number of occurences can be read from the color. – ImportanceOfBeingErnest May 20 '17 at 20:21