0

I have a pandas frame with distance matrix, I use PCA to do the dim reduction. The the dataframe of this distance matrix has label for each point, and size.

How can I make each scattered point become a circle with a size dependent on the size from the dataframe

````
pca = PCA(n_components=2)
pca.fit(dist)
mds5 = pca.components_

fig = go.Figure()
fig.add_scatter(x = mds5[0],
       y = mds5[1],
       mode = 'markers+text',
       marker= dict(size = 8,
             color= 'blue'
            ),
text= dist.columns.values,
textposition='top right')
````

I need to have the scatter plot looks something like this example, however, when I add the size for each point in related answers, I cant get the circles to overlap, and when they do, I can zoom in, then they dont overlap anymore

sounds strange, but I need to create a logic, that if two circles overlap, the one with smaller radius will dissapear, so:

  1. how to keep the circle size the same, regardless of the zoom
  2. how to create a logic in python to cancel the smaller overlapping circle?

enter image description here

Jay Qadan
  • 87
  • 1
  • 12
  • I think this is a duplicate of [this question](https://stackoverflow.com/questions/30313882/scatterplot-with-different-size-marker-and-color-from-pandas-dataframe). – Asmus Apr 17 '19 at 07:01
  • 1
    Possible duplicate of [Scatterplot with different size, marker, and color from pandas dataframe](https://stackoverflow.com/questions/30313882/scatterplot-with-different-size-marker-and-color-from-pandas-dataframe) – Asmus Apr 17 '19 at 07:01
  • @Asmus i Just updated the question, see my notes below the code. would you be able to help. Thanks. – Jay Qadan Apr 18 '19 at 00:33
  • I think [Scale matplotlib.pyplot.Axes.scatter markersize by x-scale](https://stackoverflow.com/questions/48172928/scale-matplotlib-pyplot-axes-scatter-markersize-by-x-scale/48174228#48174228) is more what you're looking for. – ImportanceOfBeingErnest Jun 12 '19 at 10:51

1 Answers1

4

I'm still not sure which PCA parameter you want to be reflected in the circle size, but: either you want to

  • use a scatter plot (i.e. ax.scatter()) whose size= is reflecting your chosen PCA parameter; this size will (and should not) rescale when you rescale the figure; it is also not given in (x,y)-units
  • use multiple plt.Circle((x,y), radius=radius, **kwargs) patches, whose radii are given in (x,y)-units; the point overlap is then consistent on rescale, but this will likely cause deformed points

The following animation will emphasise the issue at hand: Rescaling different point plots

I suppose you want the plt.Circle-based solution, as it keeps the distance static, and then you need to "manually" calculate beforehand whether two points overlap and delete them "manually". You should be able to do this automatically via a comparison between point size (i.e. radius, your PCA parameter) and the euclidian distance between your data points (i.e. np.sqrt(dx**2 + dy**2)).

To use Circles, you could e.g. define a shorthand function:

def my_circle_scatter(ax, x_array, y_array, radius=0.5, **kwargs):
    for x, y in zip(x_array, y_array):
        circle = plt.Circle((x,y), radius=radius, **kwargs)
        ax.add_patch(circle)
    return True

and then call it with optional parameters (i.e. the x- and y-coordinates, colors, and so on):

my_circle_scatter(ax, xs, ys, radius=0.2, alpha=.5, color='b')

Where I've used fig,ax=plt.subplots() to create the figure and subplot individually.

Asmus
  • 5,117
  • 1
  • 16
  • 21
  • the radius is not related to PCA, or x,y but independent variable. with multiple `plt.Circle((x,y), radius=radius, **kwargs)` does it need to be in a loop? sorry what is **kwargs stands for? – Jay Qadan Apr 18 '19 at 12:01
  • @JayQadan `**kwargs` is a placeholder for any additional **k**ey**w**ord **arg**ument**s** you might pass, e.g. `color="r"`. I've updated my answer to include an example of `Circle()` – Asmus Apr 18 '19 at 12:33
  • 1
    If you have many circles, you might want to consider using a [circle collection](https://matplotlib.org/3.1.1/api/collections_api.html#matplotlib.collections.CircleCollection) – Rotem Shalev Jan 19 '20 at 17:48