2

I am trying to add a vertical line to a plotly line plot in python and it seems to work but plotly sometimes misplaces the vertical line and I do not know why. The x-values are string timestamps of the form '10:45:21.000000' and the y-values are just integers.

Here is my code:

import plotly.graph_objects as go
import plotly.express as px

vert_line = '10:45:49.983727'

fig = px.line(data, x="time", y="y")
fig.add_shape(
        dict(
            type="line",
            x0=vert_line,
            y0=data['y'].min(),
            x1=vert_line,
            y1=data['y'].max(),
            line=dict(
                color="Red",
                width=3,
                dash="dot",
            )
))
fig.show()

I can post some toy data, but i noticed the behaviour is super inconsistent depending on the data I feed it. Here are some examples where I sliced the data differently. Each plot is based on the above code just slicing the data by data[:100] , data[:200], and data[:300] respectively:

enter image description here enter image description here enter image description here

Notice the vertical line changes places and is never at what its actual value is. Why is this occurring? How can I get to plot where it should be?

EDIT: As requested, here's some toy data to get you started but this issue is dependent on the exact slice of data so it won't be reproducible with just this bit of data, the actual complete dataset is larger and I don't know a practical way to share that on stackoverflow.

[{'time': '10:42:21.000000', 'y': 342688},
 {'time': '10:42:22.000000', 'y': 342700},
 {'time': '10:42:23.000000', 'y': 342681},
 {'time': '10:42:24.000000', 'y': 342680},
 {'time': '10:42:25.000000', 'y': 342692},
 {'time': '10:42:26.000000', 'y': 342696},
 {'time': '10:42:27.000000', 'y': 342699},
 {'time': '10:42:28.000000', 'y': 342727},
 {'time': '10:42:29.000000', 'y': 342725},
 {'time': '10:42:30.000000', 'y': 342731},
 {'time': '10:42:31.000000', 'y': 342735},
 {'time': '10:42:32.000000', 'y': 342750},
 {'time': '10:42:33.000000', 'y': 342750},
 {'time': '10:42:34.000000', 'y': 342725},
 {'time': '10:42:35.000000', 'y': 342700},
 {'time': '10:42:36.000000', 'y': 342725},
 {'time': '10:42:37.000000', 'y': 342725},
 {'time': '10:42:38.000000', 'y': 342700},
 {'time': '10:42:39.000000', 'y': 342700}]
guy
  • 1,021
  • 2
  • 16
  • 40

1 Answers1

1

Complete snippet at the end


I've managed to reproduce your issue, and my preliminary conclusion has to be that this is caused by a bug. I'm basing this conclusion on an assumption that the variable 'vert_line' has a value that falls outside the x-range for your figures. And, as I will show you, the specified shape seems to be put in the middle of the figure if x0 and x1 fall out of the range displayed on the x-axis. Below I have recreated a dataset with a time value that replicates your real world data. And I've set vert_line = '00:00:00.000044'.

This works fine for the first figure where '00:00:00.000044' is included in the x-axis range:

enter image description here

Now see what happens if I change vert_line = '00:00:00.000044' to a value outside the displayed range. Or, as in your case, make another subset of the data with data = data[:40] that also makes the specified vert_line fall out of the range:

enter image description here

For apparently no reason what so ever, the shape is placed right in the middle of the figure. Just as in all your provided figures. I can't possibly fix how these things work. But you can make sure to not produce the shape if vert_line falls out of the range displayed.

Complete code:

import plotly.graph_objects as go
import plotly.express as px
import random
import numpy as np
import pandas as pd

# data
np.random.seed(4)
n = 600
data = pd.DataFrame({'time':[t[11:28] for t in pd.date_range('2020', freq='U', periods=n).format()],
                      'y':np.random.uniform(low=-1, high=1, size=n).tolist()})
data['y']=data['y'].cumsum()

#vert_line = '10:45:49.983727'
#vert_line = random.choice(data['time'].to_list())
#vert_line = '00:00:00.000256'
vert_line = '00:00:00.000044'


data = data[:40]
fig = px.line(data, x="time", y="y")
fig.add_shape(
        dict(
            type="line",
            x0=vert_line,
            y0=data['y'].min(),
            x1=vert_line,
            y1=data['y'].max(),
            line=dict(
                color="Red",
                width=3,
                dash="dot",
            )
))

fig.update_layout(title=vert_line)
fig.update_xaxes(tickangle=90)

fig.show()

Edit: Test for different values of vert_line and a subset of the original data

The following snippet sets up a dataset with 100 observations, selects a random vert_line value from those observations, but splits the dataset in two before the figure is produced. This way, there will only be a 50% chance that vert_line stays in the range of the figure. Run it a few times, and you'll see that the shape is shown exactly as it's supposed to be as long as vert_line can be found on the x-axis. As soon as it can't, the shape is just placed there in the middle.

import plotly.graph_objects as go
import plotly.express as px
import random
import numpy as np
import pandas as pd

# data
np.random.seed(4)
n = 100
data = pd.DataFrame({'time':[t[11:28] for t in pd.date_range('2020', freq='U', periods=n).format()],
                      'y':np.random.uniform(low=-1, high=1, size=n).tolist()})
data['y']=data['y'].cumsum()

#vert_line = '10:45:49.983727'
vert_line = random.choice(data['time'].to_list())
#vert_line = '00:00:00.000256'
#vert_line = '00:00:00.000044'


data = data[:50]
fig = px.line(data, x="time", y="y")
fig.add_shape(
        dict(
            type="line",
            x0=vert_line,
            y0=data['y'].min(),
            x1=vert_line,
            y1=data['y'].max(),
            line=dict(
                color="Red",
                width=3,
                dash="dot",
            )
))

fig.update_layout(title=vert_line)
fig.update_xaxes(tickangle=90)

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305
  • `I'm basing this conclusion on an assumption that the variable vert_line' has a value that fall outside the x-range for your figures.` I assure you that my vertical line value falls within my values and not outside them. But I appreciate you digging into this. – guy Sep 02 '20 at 22:28
  • 1
    @guy Well, you should at leat double check. I'll provide another snippet that at least proves that there has to be a bug as soon as the value is not included in the range. – vestland Sep 02 '20 at 22:30
  • @guy I'm pretty sure I'm right about this. And it really is the only viable conclusion as long as you're not sharing proper datasamples to replicate the problem. It's not hard at all. Just try a few `vert_lines` and `data[:n]` where your problem turns up and `n` is not too large. You can easily share a sample of your dataframe that reproduces your error by following the simple steps lined up [here](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict/63163254#63163254) – vestland Sep 02 '20 at 22:40
  • @guy Actually, `vert_line = '10:45:49.983727'` appears in ***neither*** of your figures. As you say for yourself `vert_line` is of type string. Which means it's a categorical variable. Which also means that you'll have to have an exact match in order for plotly to know where to put your shape. As far as I can tell, ***all*** your x-axis values end with `'000000'` – vestland Sep 02 '20 at 22:49