2

I am trying to create a chart to plot the historical data of apple stock alongside earthquake occurrences. I have two data frames, one with historical apple stock data and another with historical earthquake data. I want to display each earthquake occurrence as a marker or shape relative to the apple stock price on that date.

Question

  1. How do I plot the earthquake events as markers or shapes relative to the apple chart?
  2. How do I handle the markers or shapes with multiple earthquakes and prevent them from overlapping or overwriting?

Apple Data

Date AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted dn mavg up direction
0 2015-02-17 00:00:00+00:00 127.49 128.88 126.92 127.83 63152400 122.905 106.741 117.928 129.114 Increasing
1 2015-02-18 00:00:00+00:00 127.63 128.78 127.45 128.72 44891700 123.761 107.842 118.94 130.038 Increasing
2 2015-02-19 00:00:00+00:00 128.48 129.03 128.33 128.45 37362400 123.501 108.894 119.889 130.884 Decreasing
3 2015-02-20 00:00:00+00:00 128.62 129.5 128.05 129.5 48948400 124.511 109.785 120.764 131.742 Increasing
4 2015-02-23 00:00:00+00:00 130.02 133 129.66 133 70974100 127.876 110.373 121.72 133.068 Increasing

Earthquake Data

Date Latitude Longitude Magnitude
22539 2015-02-17 00:00:00+00:00 40.1095 141.891 5.5
22540 2015-02-17 00:00:00+00:00 39.5696 143.583 5.5
22541 2015-02-18 00:00:00+00:00 8.3227 -103.159 5.5
22542 2015-02-18 00:00:00+00:00 8.285 -103.054 5.5
22543 2015-02-18 00:00:00+00:00 -10.7598 164.122 6.1

My current code

import pandas as pd
import plotly.graph_objects as go

if __name__ == '__main__':
    # Create dataframe of historical apple stock and earth quakes
    df_apple_stock = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
    df_earthquakes = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
    
    # Convert data column to UTC datetime
    df_apple_stock['Date'] = pd.to_datetime(df_apple_stock['Date'], utc=True)
    df_earthquakes['Date'] = pd.to_datetime(df_earthquakes['Date'], utc=True)
    
    # Trim earthquake data to be only of 2015-2016
    start_day = pd.to_datetime('02/17/2015', utc=True)
    end_day = pd.to_datetime('12/31/2016', utc=True)    
    
    df_earthquakes = df_earthquakes[df_earthquakes['Date'].between(start_day, end_day)]  
    
    fig = go.Figure(data=[go.Scatter(x=df_apple_stock['Date'],
                                     y=df_apple_stock['AAPL.Close'],
                                     customdata=df_apple_stock,
                                     mode='lines',  # lines+markers
                                     # marker=dict(
                                     #     size=5,
                                     #     line=dict(width=2, color='DarkSlateGrey')
                                     # ),
                                     # hoveron='points',
                                     hovertemplate=
                                     '<b>%{x}</b><br>' +
                                     'open: %{customdata[1]:$.2f} <br>' +
                                     'close: %{y:$.2f} <br>' +
                                     'high: %{customdata[2]:$.2f} <br>' +
                                     'low: %{customdata[3]:$.2f} <br>' +
                                     'volume: %{customdata[5]:,}'
                                     # '<extra>test</extra>'
                                     )])
    
    fig.show()

Current Plot

Example Desired Outcome

desired

What I tried

  1. I tried iterating over each earthquake row and adding an annotation; however, this has issues:
    • I am unable to figure out how to position the earthquake annotations relative to the Apple stock price
    • If there are multiple earthquakes in one day then only one of them will be shown
    • It can take a long time to iterate through every single row in a larger data set
for _, row in df_earthquakes.iterrows():
    fig.add_annotation(font=dict(color='red', size=15),
                       x=str(row.Date),
                       y=125,  # how do I reference 'y' from apple stock price?
                       showarrow=False,
                       text="Event",
                       align="left",
                       hovertext=("Date: " + str(row.Date) + "<br>" +
                                  "Magnitude: " + str(row.Magnitude) + "<br>" +
                                  "Latitude: " + str(row.Latitude) + "<br>" +
                                  "Longitude: " + str(row.Longitude)),
                     xanchor='left')
  1. Plotting two traces within a Scatter plot and using the %{xother}
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_apple_stock['Date'],
    y=df_apple_stock['AAPL.Close'],
    fill='tozeroy',
    hovertemplate="%{y}%{_xother}"
))

fig.add_trace(go.Scatter(
    x=df_earthquakes['Date'],
    y=df_earthquakes['Magnitude'],
    fill='tonexty',
    hovertemplate="%{y}%{_xother}",
))

fig.update_layout(hovermode="x unified")
  1. I tried looking up how to add data from multiple periods of data and came across Hover Templates with Mixtures of Period data, but I was not able to get it working as I would like
  2. I tried reading into the documentation, markers, annotations, shared axis on subplots
vestland
  • 55,229
  • 37
  • 187
  • 305
J. Doe
  • 55
  • 1
  • 7

2 Answers2

2

I've set up a suggestion that should take care of your concerns. I'm using a built-in dataset and some random choices for duplicated dates. If you would like me to work on a sample of your actual dataset, please include that using the approach described here.

First suggestion:

1. Main trace is added to figure with fig.add_traces(go.Scatter)

2. Dates with quakes are arranged in two different datasets; one showing dates with single events, and one with duplicate dates.

3. Duplicate dates are organized in multiple = quakes[quakes.date.duplicated()] for which each and every record are assigned to a single trace. This will let you set up different symbols and hoverdata as you wish.

4. Values that belong to duplicated dates are offset on the y-axis compared to each other to ensure that the corresponding annotations are not overlapping or overwritten.

If this comes close to your desired result, we can talk details when you find the time.

Plot:

enter image description here

Code 1

# imports
import pandas as pd
import plotly.express as px
import random
import numpy as np
import plotly.graph_objects as go
from plotly.validators.scatter.marker import SymbolValidator
from itertools import cycle

np.random.seed(123)

# data
df = px.data.stocks()
df = df.drop(['GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).tail(150)

# simule
quakes =pd.DataFrame()

dates = [random.choice(df.date.values) for obs in range(0, 6)]
dates.extend([df.date.iloc[2], df.date.iloc[2], df.date.iloc[6], df.date.iloc[6], df.date.iloc[6]])

# synthetic data for earthquakes
quakes['date'] = dates
quakes['magnitude'] = [np.random.uniform(5,7) for obs in quakes.date]
quakes = pd.concat([quakes, quakes.groupby('date').cumcount().to_frame('dupes')], axis = 1)

# find dates with multiple quakes
multiple = quakes[quakes.date.duplicated()].sort_values('date').reset_index()#.sorted()

# find dates where only one quake occurs (to keep number of traces at a minimum)
single = quakes[~quakes.date.duplicated()].sort_values('date').reset_index()
single = pd.merge(df, single, on = 'date', how = 'right')

fig = go.Figure(go.Scatter(x = df['date'], y = df['AAPL'], name = 'Apple'))
fig.add_traces(go.Scatter(x=single['date'], y =single['AAPL'],
                          mode = 'markers',
                          name = 'days with quakes',
                          showlegend = True,
                          marker = dict(symbol = 'square', size = single['magnitude']**2)))

symbols = cycle(['circle', 'hexagon', 'diamond', 'star'])
annotations = []
for i, r in multiple.iterrows():
    fig.add_traces(go.Scatter(x=[r['date']], y = df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10),
                              mode = 'markers',
                              name = r['date'],
                              marker = dict(symbol = next(symbols), size = r['magnitude']**2)))
    annotations.append([r['date'], df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10), r['magnitude']])

# annotate single events
for i, q in enumerate(fig.data[1].x):
        fig.add_annotation(x=q, y = fig.data[1].y[i],
                       text = str(fig.data[1].y[i])[:3], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
    

# annotate duplicates
for a in annotations:
    fig.add_annotation(x=a[0], y = a[1].item(),
                       text = str(a[2])[:4], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
fig.show()
        
        
        
        
vestland
  • 55,229
  • 37
  • 187
  • 305
  • This is approach is what I am trying to accomplish. However, I am running into some issues in which I am not sure how to fix. There are dates where earthquakes occur when the stock market is closed (such as weekends or holidays) and there is no axis to plot the earthquakes. – J. Doe Sep 10 '21 at 09:24
  • I also switched the single's annotation from `text = str(fig.data[1].y[i])[:3]` to `text=str(single['magnitude'][i])[:4]` to show the magnitude on the first occurrence. Can you explain what `df[df['date'] == r['date']]['AAPL'] * (1 + r['dupes'] / 10)` and `single['magnitude']**2` does? – J. Doe Sep 10 '21 at 09:51
  • @J.Doe Sure! But lets not flood the comments section, and rather [chat here](https://chat.stackoverflow.com/rooms/236964/room-for-vestland-and-j-doe) – vestland Sep 10 '21 at 10:39
  • 1
    @J.Doe But the short story is this: `df[df['date'] == r['date']]['AAPL']` retrieves the value of `AAPL` on *one* date where there are several quakes. `(1 + r['dupes'] / 10)` then **offset** the indicator and annotation to make sure that they don't overwrite eachother. In the case where there's *one* duplicate, `(1 + r['dupes'] / 10)` gets the value `1.1` - in other words and increase of 10%. In the case where there are *two* duplicates, the second duplicate gets the value `1.2`. A suitable value will of course depend on the range of the `y-axis`. – vestland Sep 10 '21 at 10:58
  • @J.Doe Regarding quakes that occur on dates when the stock market is closed, I would consider reassigning those to the first date the market is open after the date. If the quakes have any effect on the prices, this is when you would see them. Another option is to merge your stock data to a series of *continuous* dates and plot the quakes against that. – vestland Sep 10 '21 at 13:09
1
  • you can plot on a secondary y-axis two y-axis
  • have used plotly express to plot earthquakes, then transferred traces and layout to other all figure
from plotly.subplots import make_subplots
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px


df_apple_stock = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df_earthquakes = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv"
)

# Convert data column to UTC datetime
df_apple_stock["Date"] = pd.to_datetime(df_apple_stock["Date"], utc=True)
df_earthquakes["Date"] = pd.to_datetime(df_earthquakes["Date"], utc=True)

# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime("02/17/2015", utc=True)
end_day = pd.to_datetime("12/31/2016", utc=True)

df_earthquakes = df_earthquakes[df_earthquakes["Date"].between(start_day, end_day)]

fig = go.Figure(
    data=[
        go.Scatter(
            x=df_apple_stock["Date"],
            y=df_apple_stock["AAPL.Close"],
            customdata=df_apple_stock,
            mode="lines",  # lines+markers
            name="AAPL.Close",
            # marker=dict(
            #     size=5,
            #     line=dict(width=2, color='DarkSlateGrey')
            # ),
            # hoveron='points',
            hovertemplate="<b>%{x}</b><br>"
            + "open: %{customdata[1]:$.2f} <br>"
            + "close: %{y:$.2f} <br>"
            + "high: %{customdata[2]:$.2f} <br>"
            + "low: %{customdata[3]:$.2f} <br>"
            + "volume: %{customdata[5]:,}"
            # '<extra>test</extra>'
        )
    ]
)

fige = px.scatter(
    df_earthquakes,
    x="Date",
    y="Magnitude",
    color="Magnitude",
    color_continuous_scale="reds",
)

fig2 = make_subplots(specs=[[{"secondary_y": True}]])

fig2.add_trace(fig.data[0])
fig2.add_trace(fige.data[0], secondary_y=True)

fig2.update_layout(coloraxis=fige.layout.coloraxis).update_layout(coloraxis={"colorbar":{"y":.4}})

enter image description here

alternative for earthquakes

  • inspired by @vestland answer
  • earthquake data can be summarised first with pandas, frequency is not daily so aggregate to daily
  • also have filtered / removed days that have fewer that 3 earthquakes
  • more on color and on size
fige = px.scatter(
    df_earthquakes.groupby(df_earthquakes["Date"].dt.date).agg(
    Magnitude=("Magnitude", "max"), Count=("Date", "count")
).reset_index().loc[lambda d: d["Count"].gt(3)],
    x="Date",
    y="Magnitude",
    color="Magnitude",
    size="Count",
    color_continuous_scale="rdylgn_r",
)

enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30