-1

I'm trying to plot my data in a comparative timeseries, however, I keep getting plots like this:broken plot

right now my data is formatted in a pandas DataFrame as so:

|   | Date                             | Name1          | Name2   | Name3 |
| 0 | Timestamp('2005-08-06 00:00:00') | 1.5            | NaN     | 3 |
| 1 | Timestamp('2003-09-07 00:00:00') | NaN            | 1.3     | 2 |
| 2 | Timestamp('2002-10-02 00:00:00') | 1.6            | NaN     | NaN |
| 3 | Timestamp('1996-11-02 00:00:00') | 1.6            | 1       | NaN |
| 4 | Timestamp('2005-10-02 00:00:00') | NaN            | NaN     | 1 |

and my process of plotting it goes as such:

def order_names(d):
    d_ret = dict()
    for i in d.keys():
        x_coordinates = d[i].keys()
        y_coordinates = d[i].values()
        x,y = zip(*sorted(zip(x_coordinates, y_coordinates)))
        d_star = {xi : yi for xi, yi in zip(x,y)}
        d_ret[i] = d_star
    return d_ret

d = get_data() #d = {name : {datetime.datetime : value...} ...}
d = order_data(d)
df = pd.DataFrame(d)
df.reset_index(inplace = True)
df.rename(columns={"index": "date"},inplace = True)



app = dash.Dash(__name__)

app.layout = html.Div([
    dcc.Dropdown(
        id='name',
        options=[{"label": x, "value": x} 
                 for x in df.columns[1:]],
        multi=True,
        value='Name1',
    ),
    dcc.Graph(id="name-chart"),
])

@app.callback(
    Output("name-chart", "figure"), 
    [Input("name", "value")])
def display_time_series(name):
    fig = px.line(df, x='date', y=name, title = "Value vs. Time")
    return fig

And I cannot figure out why dash/plotly is displaying the top graph out of order.

tl;dr: I'm trying to plot multiple timeseries which don't may or may not have matching dates

  • Providing a useful answer to this is next to impossible without a proper sample of your dataset that will reproduce the provided figure. Please take a look at [this](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict/63163254#63163254) an provide a more useful data sample. – vestland Feb 20 '21 at 22:38

1 Answers1

1

Its because of your messy data. This happens when you are missing x axis values.

There are a couple of solutions you could try: python plotly time series handle missing dates corretly

And: https://community.plotly.com/t/scatterplot-lines-unwanted-connecting/8729

You need to ensure your data is all based on the same x axis and this x axis must be ordered correctly. Based on your sample code Im not entirely convinced you are properly ordering the dataframe. You can't plot data that 'may or may not have matching dates'.

Also based on how much missing data it appears you have, I would highly recommend building a scatter plot with connected edges.

Devin Burke
  • 520
  • 2
  • 8
  • It's not so much that the data was missing, moreso that the events I'm plotting happen only 2-3X/year/entity, and each entity sets its own schedule. Regardless, plotting this as a scatterplot with lines+markers worked wonderfully – Justin Cabot-Miller Feb 20 '21 at 23:19
  • Right so you just have to be extremely diligent in ensuring theres a common x-axis. If you have random x-axis values then plotly will have no way to know what goes where. And if it only happens 2-3 per year per entity I would definitely do the scatter plot which allows for gaps – Devin Burke Feb 20 '21 at 23:21