4

I have written a basic plotly dash app that pulls in data from a csv and displays it on a chart. You can then toggle values on the app and the graph updates.

However, when I add new data to the csv (done once each day) the app doesn't update the data on refreshing the page.

The fix is normally that you define your app.layout as a function, as outlined here (scroll down to updates on page load). You'll see in my code below that I've done that.

Here's my code:

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import numpy as np

import pandas as pd

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

path = 'https://raw.githubusercontent.com/tbuckworth/Public/master/CSVTest.csv'

df = pd.read_csv(path)
df2 = df[(df.Map==df.Map)]


def layout_function():

    df = pd.read_csv(path)
    df2 = df[(df.Map==df.Map)]
    
    available_strats = np.append('ALL',pd.unique(df2.Map.sort_values()))
    classes1 = pd.unique(df2["class"].sort_values())
    metrics1 = pd.unique(df2.metric.sort_values())
    
    return html.Div([
            html.Div([
                dcc.Dropdown(
                    id="Strategy",
                    options=[{"label":i,"value":i} for i in available_strats],
                    value=list(available_strats[0:1]),
                    multi=True
                ),
                dcc.Dropdown(
                    id="Class1",
                    options=[{"label":i,"value":i} for i in classes1],
                    value=classes1[0]
                ),
                dcc.Dropdown(
                    id="Metric",
                    options=[{"label":i,"value":i} for i in metrics1],
                    value=metrics1[0]
                )],
            style={"width":"20%","display":"block"}),
                
        html.Hr(),
    
        dcc.Graph(id='Risk-Report')          
    ])
            
app.layout = layout_function


@app.callback(
        Output("Risk-Report","figure"),
        [Input("Strategy","value"),
         Input("Class1","value"),
         Input("Metric","value"),
         ])

def update_graph(selected_strat,selected_class,selected_metric):
    if 'ALL' in selected_strat:
        df3 = df2[(df2["class"]==selected_class)&(df2.metric==selected_metric)]
    else:
        df3 = df2[(df2.Map.isin(selected_strat))&(df2["class"]==selected_class)&(df2.metric==selected_metric)]
    df4 = df3.pivot_table(index=["Fund","Date","metric","class"],values="value",aggfunc="sum").reset_index()
    traces = []
    for i in df4.Fund.unique():
        df_by_fund = df4[df4["Fund"] == i]
        traces.append(dict(
                x=df_by_fund["Date"],
                y=df_by_fund["value"],
                mode="lines",
                name=i
                ))
    
    if selected_class=='USD':
        tick_format=None
    else:
        tick_format='.2%'
    
    return {
            'data': traces,
            'layout': dict(
                xaxis={'type': 'date', 'title': 'Date'},
                yaxis={'title': 'Values','tickformat':tick_format},
                margin={'l': 40, 'b': 40, 't': 10, 'r': 10},
                legend={'x': 0, 'y': 1},
                hovermode='closest'
            )
        }
    

if __name__ == '__main__':
    app.run_server(debug=True)

Things I've tried

  1. Removing the initial df = pd.read_csv(path) before the def layout_function():. This results in an error.
  2. Creating a callback button to refresh the data using this code:
@app.callback(
        Output('Output-1','children'),
        [Input('reload_button','n_clicks')]        
        )

def update_data(nclicks):
    if nclicks == 0:
        raise PreventUpdate
    else:
        df = pd.read_csv(path)
        df2 = df[(df.Map==df.Map)]
        return('Data refreshed. Click to refresh again')

This doesn't produce an error, but the button doesn't refresh the data either.

  1. Defining df within the update_graph callback. This updates the data every time you toggle something, which is not practicable (my real data is > 10^6 rows, so i don't want to read it in every time the user changes a toggle value)

In short, i think that defining app.layout = layout_function should make this work, but it doesn't. What am I missing/not seeing?

Appreciate any help.

Titus Buckworth
  • 382
  • 3
  • 12
  • Is it possible that the data is being cached, and you are only getting what's still in the (stale) cache? I don't see any obvious issues with what you're doing. – coralvanda Aug 20 '20 at 22:49

1 Answers1

8

TLDR; I would suggest that you simply load the data from within the callback. If load time is too long, you could change the format (e.g. to feather) and/or reduce the data size via pre processing. If this is still not fast enough, the next step would be to store the data in a server-side in-memory cache such as Redis.


Since you are reassigning df and df2 in the layout_function, these variables are considered local in Python, and you are thus not modifying the df and df2 variables from the global scope. While you could achieve this behavior using the global keyword, the use of global variables is discouraged in Dash.

The standard approach in Dash would be to load the data in a callback (or in the the layout_function) and store it in a Store object (or equivalently, a hidden Div). The structure would be something like

import pandas as pd
import dash_core_components as dcc
from dash.dependencies import Output, Input

app.layout = html.Div([
    ...
    dcc.Store(id="store"), html.Div(id="trigger")
])

@app.callback(Output('store','data'), [Input('trigger','children')], prevent_initial_call=False)
def update_data(children):
    df = pd.read_csv(path)
    return df.to_json()

@app.callback(Output("Risk-Report","figure"), [Input(...)], [State('store', 'data')])
def update_graph(..., data):
    if data is None:
        raise PreventUpdate
    df = pd.read_json(data)
    ...

However, this approach will typically be much slower than just reading the data from disk inside the callback (which seems to be what you are trying to avoid) as it results in the data being transferred between the server and client.

emher
  • 5,634
  • 1
  • 21
  • 32
  • Thank you for this detailed answer. This has helped me understand what's going on. Loading the data from the callback works, but as mentioned before slows down the app. I will have a look into using feather to see if this speeds it up to an acceptable level, however, for my purposes, using the global keyword in the layout function makes the app behave exactly as I want. I know this is bad practice, but as i'm only using it to pull in the data (not change it in any session-specific way) I feel like this won't be a problem. (also for the time being, i will be the only user). – Titus Buckworth Aug 21 '20 at 08:42