From scatter plot to 2D array

Question

My mind has gone completely blank on this one.

I want to do what I think is very simple.

Suppose I have some test data:

import pandas as pd
import numpy as np
k=10
df = pd.DataFrame(np.array([range(k), 
                           [x + 1 for x in range(k)],
                           [x + 4 for x in range(k)], 
                           [x + 9 for x in range(k)]]).T,columns=list('abcd'))

where rows correspond to time and columns to angles, and it looks like this:

   a   b   c   d
0  0   1   4   9
1  1   2   5  10
2  2   3   6  11
3  3   4   7  12
4  4   5   8  13
5  5   6   9  14
6  6   7  10  15
7  7   8  11  16
8  8   9  12  17
9  9  10  13  18

Then for reasons I convert it to and ordered dictionary:

def highDimDF2Array(df):
    from collections import OrderedDict # Need to preserve order

    vels = [1.42,1.11,0.81,0.50]

    # Get dataframe shapes
    cols = df.columns

    trajectories = OrderedDict()
    for i,j in enumerate(cols):
        x = df[j].values
        x = x[~np.isnan(x)]

        maxTimeSteps = len(x)
        tmpTraj = np.empty((maxTimeSteps,3))
        # This should be fast
        tmpTraj[:,0] = range(maxTimeSteps) 
        # Remove construction nans
        tmpTraj[:,1] = x
        tmpTraj[:,2].fill(vels[i])

        trajectories[j] = tmpTraj

    return trajectories

Then I plot it all

import matplotlib.pyplot as plt
m = highDimDF2Array(df)
M = np.vstack(m.values())
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.show()

Now all I want to do is to put all of that into a 2D numpy array with the properties:

Time is mapped to the x-axis (or y doesn't matter)
Angle is mapped to the y-axis
The entries in the matrix correspond to the values of the coloured dots in the scatter plot
All other entries are treated as NaNs (i.e. those that are undefined by a point in the scatter plot)

In 3D the colour would correspond to the height.

I was thinking of using something like this: 3d Numpy array to 2d but am not quite sure how.

Molly · Accepted Answer · 2016-09-13T22:30:28.403

2

You can convert the values in M[:,1] and M[:,2] to integers and use them as indices to a 2D numpy array. Here's an example using the value for M you defined.

out = np.empty((20,10))
out[:] = np.NAN
N = M[:,[0,1]].astype(int)
out[N[:,1], N[:,0]] = M[:,2]
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.imshow(out, interpolation='none', origin = 'lower')

Here you can convert M to integers directly but you might have to come up with a function to map the columns of M to integers depending on the resolution of the array you are creating.

edited Sep 13 '16 at 22:30

answered Sep 13 '16 at 21:07

Molly

13,240
4
44
45

Should not `M = M.astype(int)` be `M = M[:,[0,1]].astype(int)` otherwise you turn all of it into integers. – Astrid Sep 13 '16 at 22:13
`out[:] = np.NAN N = M[:,[0,1]].astype(int) out[N[:,0], N[:,1]] = M[:,2]` – Astrid Sep 13 '16 at 22:15

dnalow · Answer 2 · 2016-09-14T12:11:38.480

I don't use pandas, so I cannot really follow what your function does. But from the description of your array M and what you want, I think the funktion np.histogram2d is what you want. It bins the range of your independent values in equidistant steps and sums all the occurrences. You can apply weighting with your 3rd column to get the proper height. You have to choose the number of bins:

z, x, y   = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=50)
num, x, y = np.histogram2d(M[:,0], M[:,1], bins=50)

z /= num # proper averaging, it also gives you NaN where num==0

plt.pcolor(x, y, z) #visualization

Also plt.hist2d could be interesting

edit: The histogram2d yields the 2D array which was asked for in the question. The visualization, however, should be done with imshow, since pcolor doesn't skip NaN values (is there some way to teach it?)

The advantage of this method is that the x,y values can be float and of arbitrary order. Further, by defining the number of bins, one can choose the resolution of the resulting image. Nevertheless, to get exactly the result which was asked for, one should do:

binx = np.arange(M[:,0].min()-0.5, M[:,0].max()+1.5) # edges of the bins. 0.5 is the half width
biny = np.arange(M[:,1].min()-0.5, M[:,1].max()+1.5)

z,   x, y   = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=(binx,biny))
num, x, y   = np.histogram2d(M[:,0], M[:,1], bins=(binx,biny))

z /= num


plt.imshow(z.T, interpolation='none', origin = 'lower')

the output of pcolor doesn't leave out the nans but therefore takes also x and y values into account:

plt.pcolormesh(x, y, z.T, vmin=0, vmax=2)

I am not going to lie but I didn't work I am afraid to say; It just creates one big pixel. — Astrid, Sep 13 '16 at 21:58
I would encourage you to try it again, since this is the more general approach to do what you asked for. Nevertheless, I understand that the other answer above works in you special case. — dnalow, Sep 14 '16 at 09:16
Certainly, I'll post an image of the resulting plot to show you what I mean. Perhaps you have a plot of it working on your end? — Astrid, Sep 14 '16 at 09:34

From scatter plot to 2D array

2 Answers2