3

I have a dropdown menu where I can choose the x- and y-axis variables for a scatter plot. Moreover, a categorical variable can be selected in the menu indicating how to color the points. This seems to work for a few clicks, but then I am getting ‘%{customdata[0]}’ in the hover box, and the plot is not correct. I am using plotly 5.9.0 in JupyterLab3. To be able to select the categorical variable for the coloring, I used traces. Below is a reproducible example:

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

X = pd.DataFrame({  'num1': [1,2,3,4],
                    'num2': [40,30,20,10],
                    'num3': [0,1,2,3],
                    'cat1': ['A', 'A', 'A', 'B'],
                    'cat2': ['c', 's', 's', 's'],
                    'cat3': ['a', 'b', 'c', 'd']})

numerical_features   = sorted(X.select_dtypes(include=np.number).columns.tolist())
categorical_features = sorted(list(set(X.columns) - set(numerical_features)))

feature_1 = numerical_features[0]
feature_2 = numerical_features[1]

fig = go.Figure()

for categorical_feature_id in range(len(categorical_features)):

    fig.add_traces(list(px.scatter(X, x=feature_1, y=feature_2, color=categorical_features[categorical_feature_id],
                                         labels={feature_1:feature_1, feature_2:feature_2},
                                         hover_data=['cat3', 'num3']).select_traces()))

fig.update_layout(
        xaxis_title=feature_1,
        yaxis_title=feature_2,
        updatemenus=[
            {
                "buttons": [
                    {
                        "label": f"x - {x}",
                        "method": "update",
                        "args": [
                            {"x": [X[x]]},
                            {"xaxis": {"title": x}},
                        ],
                    }
                    for x in numerical_features
                ]
            },
            {
                "buttons": [
                    {
                        "label": f"y - {y}",
                        "method": "update",
                        "args": [
                            {"y": [X[y]]},
                            {"yaxis": {"title": y}}
                        ],
                    }
                    for y in numerical_features
                ],
                "y": 0.8,
            },
            {
                "buttons": [
                    {
                        "label": f"z - {categorical_features[categorical_feature_id]}",
                        "method": "update",
                        "args": [{'visible':    [False if (i<categorical_feature_id) or (i>categorical_feature_id) else True for i in range(len(categorical_features))]},
                                 {'showlegend': [False if (i<categorical_feature_id) or (i>categorical_feature_id) else True for i in range(len(categorical_features))]}]
                    }
                    for categorical_feature_id in range(len(categorical_features))
                ],
                "y": 0.6,
            }])
fig.show()

An example of how the figure looks after a few updates

A similar issue has been discussed for R:

Dropdown menu for changing the color attribute of data in scatter plot (Plotly R)

I would be grateful for any help.

Cleb
  • 25,102
  • 20
  • 116
  • 151
  • Troubleshooting this example is a lot of things and time-consuming, but when I looked for what was affecting what I was seeing from not dealing with custom data, it was hover data. Disabling this seems to be what you are looking for. Please verify and comment. – r-beginners Jul 13 '22 at 02:15
  • Thanks for the comment! Unfortunately, disabling the hover box does not seem to help. While I do not know how the hover box is updated by plotly, I would be surprised if it could affect the dataframe which is inputted to px.scatter (or the updates of the points shown in the scatter plot). I disabled the hover box with ```fig = go.Figure(layout=go.Layout(hovermode=False))```. – Wiktor Olszowy Jul 13 '22 at 14:59
  • I'll share the points made as code since they didn't come across in the comments, see [Colab](https://colab.research.google.com/drive/13dkZo0PJ7hAjowQFUcT6ZJkLRUv-E4sl?usp=sharing). Try disabling this. `hover_data=['cat3', 'num3']` – r-beginners Jul 14 '22 at 01:40
  • Thanks! But the coloring still does not work as it should (after a few clicks all points have the same color). The legend also does not work as it should (just like earlier: with ```hover_data```). – Wiktor Olszowy Jul 14 '22 at 06:32
  • Since you have confirmed this, we will disable the Colab link. If you already have an answer and the issue has not been resolved, I encourage you to post it to this [community](https://community.plotly.com/). – r-beginners Jul 14 '22 at 06:37
  • I posted this issue there before posting it on SO: https://community.plotly.com/t/update-layout-not-working-after-a-few-clicks/65744?u=wo222 – Wiktor Olszowy Jul 16 '22 at 18:06

1 Answers1

1

Hello I updated your code a little bit.

I think here data transformation is a must.

I switched the px.scatter with go.Scatter() now the hover box seems to work.

I hope this does the trick.

import pandas as pd
import numpy as np
import seaborn as sns
import plotly.graph_objects as go
from collections import defaultdict

X = pd.DataFrame({  'num1': [1,2,3,4],
                    'num2': [40,30,20,10],
                    'num3': [0,1,2,3],
                    'cat1': ['A', 'A', 'A', 'B'],
                    'cat2': ['c', 's', 's', 's'],
                    'cat3': ['a', 'b', 'c', 'd']})

numerical_features   = sorted(X.select_dtypes(include=np.number).columns.tolist())
categorical_features = sorted(list(set(X.columns) - set(numerical_features)))

dfs_list = []

for categorical_feature in categorical_features:
    features = numerical_features.copy()
    features.append(categorical_feature)
    dfs_list.append(X[features].copy())

unique_classes = list(pd.unique(X[categorical_features].values.ravel()))
dict_cat_color = {unique_classes[i] : 'rgb' + str(sns.color_palette(n_colors=len(unique_classes))[i])
                  for i in range(len(unique_classes))}

features_w_cat = numerical_features.copy()
features_w_cat.append('cat')

for x in dfs_list:
    x.columns  = features_w_cat
    x["color"] = x.cat.map(dict_cat_color)

orDict = defaultdict(list)

fig = go.Figure()

# Workaround for the legend: Adding empty scatter plots with customized color and text

for key in dict_cat_color.keys():

    fig.add_traces(go.Scatter(
        x             = [None],
        y             = [None],
        name          = key,
        marker_color  = dict_cat_color[key],
        mode          = "markers",
        showlegend    = True
    ))
    
    for categorical_feature in categorical_features:
        
        if key in X[categorical_feature].unique():
            orDict[categorical_feature].append(True)
        else:
            orDict[categorical_feature].append(False)

for index,df in enumerate(dfs_list):
    
    fig.add_traces(go.Scatter(
        x             = [None],
        y             = [None],
        marker_color  = df["color"],
        customdata    = df.loc[:, ["num1","num2","num3","cat"]],
        mode          = "markers",
        hovertemplate = 'num1=%{customdata[0]}<br>num2=%{customdata[1]}<br>num3=%{customdata[2]}<br>cat=%{customdata[3]}',
        showlegend    = False
    ))

fig.update_layout(
        xaxis_title = '',
        yaxis_title = '',
        updatemenus = [
            {
                "buttons": [
                    {
                        "label": f"x - {x}",
                        "method": "update",
                        "args": [
                            {"x": [X[x]]},
                            {"xaxis": {"title": x}},
                        ],
                    }
                    for x in numerical_features
                ]
            },
            {
                "buttons": [
                    {
                        "label": f"y - {y}",
                        "method": "update",
                        "args": [
                            {"y": [X[y]]},
                            {"yaxis": {"title": y}}
                        ],
                    }
                    for y in numerical_features
                ],
                "y": 0.8,
            },
            {
                "buttons": [
                    {
                        "label": f"z - {categorical_features[categorical_feature_id]}",
                        "method": "update",
                        "args": [{'visible': orDict[categorical_features[categorical_feature_id]] + [False if (i<categorical_feature_id) or (i>categorical_feature_id) else True for i in range(len(categorical_features))]}],
                    }
                    for categorical_feature_id in range(len(categorical_features))
                ],
                "y": 0.6,
            }])

fig.show()
Ghassen Sultana
  • 1,223
  • 7
  • 18
  • Thanks! Now the hover box works, but the coloring is for traces, not categories of the different categorical variables. When I set ```showlegend=True```, the legend includes only one entry: 'trace 0' (/1/2). Do you know how to fix it? – Wiktor Olszowy Jul 14 '22 at 06:05
  • I updated the response, i added a column color that will contain the color of the scatter depending of the categorie – Ghassen Sultana Jul 14 '22 at 07:46
  • 1
    I am still figuring a way to customise the legend – Ghassen Sultana Jul 14 '22 at 11:10
  • I was using ```px``` and not ```go``` because it was easy to get the legend. One solution for ```go```, although very convoluted, would be to add traces across the categorical variables as well as across the different levels. But this seems overkill. – Wiktor Olszowy Jul 14 '22 at 12:35
  • 1
    I did find a workaround for the colors, I added an empty scatterplot with the color and text that i want, after that i filter them when the dropdown menu get's update. – Ghassen Sultana Jul 14 '22 at 15:01
  • Thanks a lot! This works quite ok. I am a bit concerned with speed. In an actual case application, I have 1700 observations (1700 points in the scatter plot) and 8 categorical variables, and the plot is a bit sluggish. Another problem, albeit small, is that in the legend, it is impossible to properly unclick/click the items. Normally, this way the respective points in the scatter plot would disappear/appear. This is because now the legend is made from dummy entries. – Wiktor Olszowy Jul 16 '22 at 10:51
  • can you mark this question as answered. for the speed you must create another question do some changes like in ( https://www.somesolvedproblems.com/2018/07/how-do-i-make-plotly-faster.html ). for the legend we can add a legendgroup for the dummy entries and the traces with the same categorie. – Ghassen Sultana Jul 17 '22 at 21:56
  • I guess the figure is sluggish because it uses traces (these are only 1700 points). Thanks for the link! I tested replacing ```go.Scatter``` with ```go.Scattergl```, and the speed greatly improved. However, it causes other problems. ```go.Scatter``` is in two loops. If I replace it with ```go.Scattergl``` in both loops, hover information from the ```hovertemplate``` disappears. If I replace it in the first loop only, speed seems fine and ```hovertemplate``` works, but the coloring of the points does not work - all points have 1 color, even though the hover boxes have different/correct colors. – Wiktor Olszowy Jul 18 '22 at 09:30