4

Problem:
I have a data set with x and y value pairs, plus lower_limit and upper_limit values for y.

I want to plot x vs. y in a plot.ly scatter plot, and colour the marker in green if lower_limityupper_limit, else in red.

I know that I could use 2 traces, or add a color column in the DataFrame. However, I'd like to generate these colour on the fly and use one trace only.

Example:
Consider this data set:

   x   y  lower_limit  upper_limit
0  1  13           10           15
1  2  13           15           20
2  3  17           15           20

The first marker (x=1, y=13) should be green, because lower_limityupper_limit (10 ≤ 13 ≤ 15), just like the third one.
However the second should be red, because y < lower_limit.

I then want to produce this graph: enter image description here


MWE:

import pandas as pd
import plotly.graph_objs as go
import plotly.plotly as py
import plotly.offline as po

data = [
    [1, 13, 10, 15],
    [2, 13, 15, 20],
    [3, 17, 15, 20]
]

df = pd.DataFrame(
    data,
    columns=['x', 'y', 'lower_limit', 'upper_limit']
)

trace = go.Scatter(
    x=df['x'],
    y=df['y'],
    mode='markers',
    marker=dict(
        size=42,
        # I want the color to be green if 
        # lower_limit ≤ y ≤ upper_limit
        # else red
        color='green',
    )
)

po.plot([trace])
ebosi
  • 1,285
  • 5
  • 17
  • 37

2 Answers2

7

I would suggest creating a new array which will store the color values, please find below the example which uses, np.where and np.logical_and to form your conditional comparison.

import plotly.offline as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot, plot
from plotly import tools
import pandas as pd
import numpy
init_notebook_mode(connected=True)
data = [
    [1, 13, 10, 15],
    [2, 13, 15, 20],
    [3, 17, 15, 20]
]

df = pd.DataFrame(
    data,
    columns=['x', 'y', 'lower_limit', 'upper_limit']
)

#df['color'] = np.where(np.logical_and(df['lower_limit'] >= df['y'], df['y']  <= df['upper_limit']), 'green', 'red')

trace = go.Scatter(
    x=df['x'],
    y=df['y'],
    mode='markers',
    marker=dict(
        size=42,
        # I want the color to be green if lower_limit ≤ y ≤ upper_limit
        # else red
        color=np.where(np.logical_and(df['lower_limit'] <= df['y'], df['y']  <= df['upper_limit']), 'green', 'red'),
    )
)

iplot([trace])

References:

  1. Pandas: np.where with multiple conditions on dataframes

  2. Pandas: Ternary conditional operator for setting a value in a DataFrame

alfredoc
  • 126
  • 6
Naren Murali
  • 19,250
  • 3
  • 27
  • 54
  • Thanks for your answer. However, as stated in the question, I'd like — as far as possible — not add a column to the table. – ebosi May 01 '19 at 08:48
  • 1
    If you do this, the legend of both colors (or n conditions) are not shown, do you know how to do that? – Henry Navarro Nov 20 '20 at 08:51
1
import pandas as pd
import numpy as np


df = pd.DataFrame({'x': {0: 1, 1: 2, 2: 3}, 'y': {0: 13, 1: 13, 2: 17}, 'lower_limit': {0: 10, 1: 15, 2: 15}, 'upper_limit': {0: 15, 1: 20, 2: 20}})

If you really don't want to add a column to df:

fig = px.scatter(df,
     x='x',
     y='y',
     color=np.where(df['y'].between(df['lower_limit'], df['upper_limit']), 'green', 'red'),
     color_discrete_sequence=pd.Series(np.where(df['y'].between(df['lower_limit'], df['upper_limit']), 'green', 'red')).drop_duplicates(),
     size=len(df)*[3])
fig.show()

Output:

Figure 1

If you don't mind a new column:

df['color'] = np.where(df['y'].between(df['lower_limit'], df['upper_limit']), 'green', 'red')

fig = px.scatter(df,
     x='x',
     y='y',
     color='color',
     color_discrete_sequence=df['color'].drop_duplicates(),
     size=len(df)*[3])

Same result:

Figure 2

amance
  • 883
  • 4
  • 14