Scatter plot two feature vector set in same figure

Question

I want to plot two feature vector in scatter plot in same figure. I am doing PCA analysis from MNIST.

Current Feature Vector lets call it Elements has 784 rows.

print Elements.shape
(784,)

I want to plot Elements[-20] and Elements[-19] scatter plot in same figure and want to achieve something like below.

I am struggling to add both elements into same plot with different color.

plt.scatter(X[-20], X[-19], c= 'r') yields only one color and no distinction of scattered value.

As hightlighted below someof my data sets are overlapping and hence below solution from SO doesnt work. SO solution

First 20 data elements of X[-20] are as below.

0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  2.84343259e-03  6.22613687e-03 -7.95592208e-15 -1.69063344e-14
  1.34798763e-14  0.00000000e+00  6.36473767e-14 -3.18236883e-14

You can plot them one after another and then call `plt.show()`. For scatter plots you need an array of Xes and an array of Ys, can you show what is in `Elements[-20]` and `Elements[-19]`? I think you are missing the Xes. — alec_djinn, Aug 02 '19 at 16:22
Is `Elements[-20]` a list of more values, an array, or a single number? Can you provide a subset of that data? Is this from your PCA results or do you still need to do the PCA? — BenT, Aug 02 '19 at 21:35
@BenT its an numpy array of len 784. It's my PCA result. Updated the question with output of first 20 elements out of 784 — oneday, Aug 02 '19 at 21:52
I know that `Elements` is an array of 784 but `Elemens[-20]` is also an array of 784? Do you have a way of determining which value you want to be red versus green? You need to have some threshold condition for what determines this difference like all numbers greater than 3 are green. Could you determine the color with Elements[-18]? Otherwise are you looking for a clustering algorithm? — BenT, Aug 02 '19 at 23:05
A scatter plot requires x and y values. Currently you only have one coordinate available. So I suppose you either left something out when asking, or your problem starts much earlier than wanting to plot a scatter. — ImportanceOfBeingErnest, Aug 04 '19 at 17:51

score 3 · Accepted Answer · answered Jul 30 '19 at 12:37

Regarding the visualization issue

You seem to be adding a scalar to your plot. What you need to do is separate your data first, and than do a plot for each of the sets. Like this:

import numpy as np
import matplotlib.pyplot as plt

def populate(a=2,b=5,dev=10, number=400):
    X = np.random.uniform(0, 50, number)
    Y = a*X+b + np.random.normal(0, dev, X.shape[0])
    return X, Y

num = 3000
x1, y1 = populate(number=num)
x2, y2 = populate(-0.2, 110, number=num)

x = np.hstack((x1, x2))
y = np.hstack((y1, y2))

fig, ax = plt.subplots(nrows=1, ncols=1)

plt.scatter(x[:num], y[:num], color="blue", alpha=0.3)
plt.scatter(x[num:], y[num:], color="red", alpha=0.3)

ax = plt.gca()
howblack = 0.15
ax.set_facecolor((howblack, howblack, howblack))
plt.show()

, which results in this:

There are numerical procedures to separate your data but that is not a visualization issue. See scikit-learn for some clustering methods. In your example, assuming the Elements is some kind of array, you need to find a way to separate the data.

Regarding the feature vector

A scatter plot typically assumes that you have at least X and Y data (so 2D or more).

You seem to be referring to a feature vector which is clearly not enough information since 700 dimensions for a vector is not exactly easy to show. So you need to decide, in your scatter plot what is X, what is Y, and what to separate into different colored populations.

Thanks for you comment. Probably I shouldn't have phrased it better. Essentially I have two elements X and Y which 780 elements inside it. I think we can ignore the 780 elements part. X data elements represents 5 and Y data elements represents 6 who's scatter plot I want to create. I don't have liberty to massage data. — oneday, Jul 30 '19 at 22:47

score 2 · Answer 2 · answered Aug 02 '19 at 14:46

2

I'm presuming that your X[-20] and X[-19] have all the necessary data to plot. In this case you just need to repeat the scatter plot command.

plt.figure()
plt.scatter(X[-20], c= 'r')
plt.scatter( X[-19], c= 'g')
plt.show()

Giving an example of your dataset might help if the above code isn't what you are looking for.

answered Aug 02 '19 at 14:46

NAP_time

181
9

1

Thanks for suggestion but - scatter requries two arguments – oneday Aug 02 '19 at 21:43

Mohsin hasan · Answer 3 · 2019-08-06T06:55:39.157

The question to some extent lacks clarity, so I will make some assumptions and answer it.

Let's say you picked 1000 samples (grayscale images of 28*28) of digits 5 and 6 from MNIST. So, your input array and label array shapes will be (1000, 786) and (1000, ). I will make some random arrays to demonstrate.

a = np.random.rand(1000, 784)
b = np.random.choice([5, 6], size=1000)

Now, I will perform PCA on my data preserving all components.

pca = PCA(784)
X = pca.fit_transform(a)

The shape of X now is (1000, 784).

The array X in your case is transformed. You can just do X = X.T and follow rest of the answer.

As a next step you would want to visualise, how different components separate digits 5 and 6. Let's take components 19 and 20 as per your question.

# get all unique digits
digits = np.unique(b)

# assign color to each digit using colormap
colors = plt.cm.Set1(digits)

# loop over digits and plot scatter plot of c1 and c2 components
c1 = 19
c2 = 20
for i in range(len(digits)):
  rows = b == digits[i]
  plt.scatter(X[rows, c1], X[rows, c2], c=[colors[i]], label=labels[i])
plt.legend()
plt.show()

On how to set colormap, refer to this awesome answer

I get following image when I executed above commands.

Andrea Mannari · Answer 4 · 2019-08-06T13:06:41.467

Let's load the MINST from Scikit-Learn (the size of every digit is 8x8)

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits

digits = load_digits()

Let's make a set x for the data of the digits 5 and y for the data of the digits 6

j=0
k=0
x_target=5
y_target=6
for i, val in enumerate(digits.target):
    if val ==x_target:
        if j==0:
            x=digits.data[i,:][:,np.newaxis].T
        else:
            x=np.concatenate([x,digits.data[i,:][:,np.newaxis].T])
        j=j+1
    if val ==y_target:
        if k==0:
            y=digits.data[i,:][:,np.newaxis].T
        else:
            y=np.concatenate([x,digits.data[i,:][:,np.newaxis].T])
        k=k+1

The shape of x is:

x.shape
Out[3]: (182, 64)

and the shape of y is:

y.shape
Out[4]: (180, 64)

You can plot the scatter with red points for the values of the digit 5 and blue points for the values of the digit 6

plt.scatter(x[:, -19], x[:, -20],c='r',alpha=0.5)
plt.scatter(y[:, -19], y[:, -20],c='b',alpha=0.5)

Scatter plot two feature vector set in same figure

4 Answers4