Better scale scatterplot points by size in plotly, some of the points are too small to see?

Question

When I build a scatterplot of this data, you can see see that the one large value (462) is completely swamping even being able to see some of the other points.

Does anyone know of a specific way to normalize this data, so that the small dots can see be seen, while maintaining a link between the size of the dot and the value size. I'm thinking would either of these make sense:

(1) Set a minimum value for the size a dot can be

(2) Do some normalization of the data somehow, but I guess the large data point will always be 462 compared to some of the other points with a value of 1.

Just wondering how other people get around this, so they don't actually miss seeing some points on the plot that are actually there? Or I guess is the most obvious answer just don't scale the points by size, and then add a label to each point somehow with the size.

score 1 · Accepted Answer · answered Oct 17 '21 at 19:58

you can clip() https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.clip.html the values used for size param
full solution below

import pandas as pd
import numpy as np
import plotly.express as px

df = pd.DataFrame(
    {"Class": np.linspace(-8, 4, 25), "Values": np.random.randint(1, 40, 25)}
).assign(Class=lambda d: "class_" + d["Class"].astype(str))
df.iloc[7, 1] = 462

px.scatter(df, x="Class", y="Values", size=df["Values"].clip(0, 50))

score 0 · Answer 2 · answered Oct 17 '21 at 18:13

This isn't really a question linking to Python directly, but more to plotting styles. There are several ways to solve the issue in your case:

Split the data into equally sized categories and assign colorlabels. Your legend would look something like this in this case: 0 - 1: color 1 2 - 20: color 2 ... The way to implement this is to split your data into the sets you want and plotting seperate scatter plots each with a new color. See here or here for examples
The second option that is frequently used is to use the log of the value for the bubble size. You would just have to point that out quite clearly in your legend.
The third option is to limit marker size to an arbitrary value. I personally am not a bit fan of this method since it changes the information shown in a degree that the other alternatives don't, but if you add a data callout, this would still be legitimate.

These options should be fairly easy to implement in code. If you are having difficulties, feel free to post runnable sample code and we could implement an example as well.

Better scale scatterplot points by size in plotly, some of the points are too small to see?

2 Answers2