How to plot data from two different DataFrames with multiple duplicates?

Question

I am trying to create a chart to plot the historical data of apple stock alongside earthquake occurrences. I have two data frames, one with historical apple stock data and another with historical earthquake data. I want to display each earthquake occurrence as a marker or shape relative to the apple stock price on that date.

Question

How do I plot the earthquake events as markers or shapes relative to the apple chart?
How do I handle the markers or shapes with multiple earthquakes and prevent them from overlapping or overwriting?

Apple Data

	Date	AAPL.Open	AAPL.High	AAPL.Low	AAPL.Close	AAPL.Volume	AAPL.Adjusted	dn	mavg	up	direction
0	2015-02-17 00:00:00+00:00	127.49	128.88	126.92	127.83	63152400	122.905	106.741	117.928	129.114	Increasing
1	2015-02-18 00:00:00+00:00	127.63	128.78	127.45	128.72	44891700	123.761	107.842	118.94	130.038	Increasing
2	2015-02-19 00:00:00+00:00	128.48	129.03	128.33	128.45	37362400	123.501	108.894	119.889	130.884	Decreasing
3	2015-02-20 00:00:00+00:00	128.62	129.5	128.05	129.5	48948400	124.511	109.785	120.764	131.742	Increasing
4	2015-02-23 00:00:00+00:00	130.02	133	129.66	133	70974100	127.876	110.373	121.72	133.068	Increasing

Earthquake Data

	Date	Latitude	Longitude	Magnitude
22539	2015-02-17 00:00:00+00:00	40.1095	141.891	5.5
22540	2015-02-17 00:00:00+00:00	39.5696	143.583	5.5
22541	2015-02-18 00:00:00+00:00	8.3227	-103.159	5.5
22542	2015-02-18 00:00:00+00:00	8.285	-103.054	5.5
22543	2015-02-18 00:00:00+00:00	-10.7598	164.122	6.1

My current code

import pandas as pd
import plotly.graph_objects as go

if __name__ == '__main__':
    # Create dataframe of historical apple stock and earth quakes
    df_apple_stock = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
    df_earthquakes = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
    
    # Convert data column to UTC datetime
    df_apple_stock['Date'] = pd.to_datetime(df_apple_stock['Date'], utc=True)
    df_earthquakes['Date'] = pd.to_datetime(df_earthquakes['Date'], utc=True)
    
    # Trim earthquake data to be only of 2015-2016
    start_day = pd.to_datetime('02/17/2015', utc=True)
    end_day = pd.to_datetime('12/31/2016', utc=True)    
    
    df_earthquakes = df_earthquakes[df_earthquakes['Date'].between(start_day, end_day)]  
    
    fig = go.Figure(data=[go.Scatter(x=df_apple_stock['Date'],
                                     y=df_apple_stock['AAPL.Close'],
                                     customdata=df_apple_stock,
                                     mode='lines',  # lines+markers
                                     # marker=dict(
                                     #     size=5,
                                     #     line=dict(width=2, color='DarkSlateGrey')
                                     # ),
                                     # hoveron='points',
                                     hovertemplate=
                                     '<b>%{x}</b><br>' +
                                     'open: %{customdata[1]:$.2f} <br>' +
                                     'close: %{y:$.2f} <br>' +
                                     'high: %{customdata[2]:$.2f} <br>' +
                                     'low: %{customdata[3]:$.2f} <br>' +
                                     'volume: %{customdata[5]:,}'
                                     # '<extra>test</extra>'
                                     )])
    
    fig.show()

Example Desired Outcome

What I tried

I tried iterating over each earthquake row and adding an annotation; however, this has issues:
- I am unable to figure out how to position the earthquake annotations relative to the Apple stock price
- If there are multiple earthquakes in one day then only one of them will be shown
- It can take a long time to iterate through every single row in a larger data set

for _, row in df_earthquakes.iterrows():
    fig.add_annotation(font=dict(color='red', size=15),
                       x=str(row.Date),
                       y=125,  # how do I reference 'y' from apple stock price?
                       showarrow=False,
                       text="Event",
                       align="left",
                       hovertext=("Date: " + str(row.Date) + "<br>" +
                                  "Magnitude: " + str(row.Magnitude) + "<br>" +
                                  "Latitude: " + str(row.Latitude) + "<br>" +
                                  "Longitude: " + str(row.Longitude)),
                     xanchor='left')

Plotting two traces within a Scatter plot and using the %{xother}

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_apple_stock['Date'],
    y=df_apple_stock['AAPL.Close'],
    fill='tozeroy',
    hovertemplate="%{y}%{_xother}"
))

fig.add_trace(go.Scatter(
    x=df_earthquakes['Date'],
    y=df_earthquakes['Magnitude'],
    fill='tonexty',
    hovertemplate="%{y}%{_xother}",
))

fig.update_layout(hovermode="x unified")

I tried looking up how to add data from multiple periods of data and came across Hover Templates with Mixtures of Period data, but I was not able to get it working as I would like
I tried reading into the documentation, markers, annotations, shared axis on subplots

Subject of [a meta question](https://meta.stackoverflow.com/questions/411419/why-cant-i-edit-my-post) (support). — Peter Mortensen, Sep 10 '21 at 11:04
"Apple" here is as in *[Forrest Gump](https://en.wikipedia.org/wiki/Forrest_Gump)*. — Peter Mortensen, Sep 10 '21 at 11:10

vestland · Accepted Answer · 2021-09-09T21:03:33.600

I've set up a suggestion that should take care of your concerns. I'm using a built-in dataset and some random choices for duplicated dates. If you would like me to work on a sample of your actual dataset, please include that using the approach described here.

First suggestion:

1. Main trace is added to figure with fig.add_traces(go.Scatter)

2. Dates with quakes are arranged in two different datasets; one showing dates with single events, and one with duplicate dates.

3. Duplicate dates are organized in multiple = quakes[quakes.date.duplicated()] for which each and every record are assigned to a single trace. This will let you set up different symbols and hoverdata as you wish.

4. Values that belong to duplicated dates are offset on the y-axis compared to each other to ensure that the corresponding annotations are not overlapping or overwritten.

If this comes close to your desired result, we can talk details when you find the time.

Plot:

Code 1

# imports
import pandas as pd
import plotly.express as px
import random
import numpy as np
import plotly.graph_objects as go
from plotly.validators.scatter.marker import SymbolValidator
from itertools import cycle

np.random.seed(123)

# data
df = px.data.stocks()
df = df.drop(['GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).tail(150)

# simule
quakes =pd.DataFrame()

dates = [random.choice(df.date.values) for obs in range(0, 6)]
dates.extend([df.date.iloc[2], df.date.iloc[2], df.date.iloc[6], df.date.iloc[6], df.date.iloc[6]])

# synthetic data for earthquakes
quakes['date'] = dates
quakes['magnitude'] = [np.random.uniform(5,7) for obs in quakes.date]
quakes = pd.concat([quakes, quakes.groupby('date').cumcount().to_frame('dupes')], axis = 1)

# find dates with multiple quakes
multiple = quakes[quakes.date.duplicated()].sort_values('date').reset_index()#.sorted()

# find dates where only one quake occurs (to keep number of traces at a minimum)
single = quakes[~quakes.date.duplicated()].sort_values('date').reset_index()
single = pd.merge(df, single, on = 'date', how = 'right')

fig = go.Figure(go.Scatter(x = df['date'], y = df['AAPL'], name = 'Apple'))
fig.add_traces(go.Scatter(x=single['date'], y =single['AAPL'],
                          mode = 'markers',
                          name = 'days with quakes',
                          showlegend = True,
                          marker = dict(symbol = 'square', size = single['magnitude']**2)))

symbols = cycle(['circle', 'hexagon', 'diamond', 'star'])
annotations = []
for i, r in multiple.iterrows():
    fig.add_traces(go.Scatter(x=[r['date']], y = df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10),
                              mode = 'markers',
                              name = r['date'],
                              marker = dict(symbol = next(symbols), size = r['magnitude']**2)))
    annotations.append([r['date'], df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10), r['magnitude']])

# annotate single events
for i, q in enumerate(fig.data[1].x):
        fig.add_annotation(x=q, y = fig.data[1].y[i],
                       text = str(fig.data[1].y[i])[:3], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
    

# annotate duplicates
for a in annotations:
    fig.add_annotation(x=a[0], y = a[1].item(),
                       text = str(a[2])[:4], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
fig.show()

This is approach is what I am trying to accomplish. However, I am running into some issues in which I am not sure how to fix. There are dates where earthquakes occur when the stock market is closed (such as weekends or holidays) and there is no axis to plot the earthquakes. — J. Doe, Sep 10 '21 at 09:24
I also switched the single's annotation from `text = str(fig.data[1].y[i])[:3]` to `text=str(single['magnitude'][i])[:4]` to show the magnitude on the first occurrence. Can you explain what `df[df['date'] == r['date']]['AAPL'] * (1 + r['dupes'] / 10)` and `single['magnitude']**2` does? — J. Doe, Sep 10 '21 at 09:51
@J.Doe Sure! But lets not flood the comments section, and rather [chat here](https://chat.stackoverflow.com/rooms/236964/room-for-vestland-and-j-doe) — vestland, Sep 10 '21 at 10:39
@J.Doe But the short story is this: `df[df['date'] == r['date']]['AAPL']` retrieves the value of `AAPL` on *one* date where there are several quakes. `(1 + r['dupes'] / 10)` then **offset** the indicator and annotation to make sure that they don't overwrite eachother. In the case where there's *one* duplicate, `(1 + r['dupes'] / 10)` gets the value `1.1` - in other words and increase of 10%. In the case where there are *two* duplicates, the second duplicate gets the value `1.2`. A suitable value will of course depend on the range of the `y-axis`. — vestland, Sep 10 '21 at 10:58
@J.Doe Regarding quakes that occur on dates when the stock market is closed, I would consider reassigning those to the first date the market is open after the date. If the quakes have any effect on the prices, this is when you would see them. Another option is to merge your stock data to a series of *continuous* dates and plot the quakes against that. — vestland, Sep 10 '21 at 13:09

Rob Raymond · Answer 2 · 2021-09-09T10:19:42.720

you can plot on a secondary y-axis two y-axis
have used plotly express to plot earthquakes, then transferred traces and layout to other all figure

from plotly.subplots import make_subplots
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px


df_apple_stock = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df_earthquakes = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv"
)

# Convert data column to UTC datetime
df_apple_stock["Date"] = pd.to_datetime(df_apple_stock["Date"], utc=True)
df_earthquakes["Date"] = pd.to_datetime(df_earthquakes["Date"], utc=True)

# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime("02/17/2015", utc=True)
end_day = pd.to_datetime("12/31/2016", utc=True)

df_earthquakes = df_earthquakes[df_earthquakes["Date"].between(start_day, end_day)]

fig = go.Figure(
    data=[
        go.Scatter(
            x=df_apple_stock["Date"],
            y=df_apple_stock["AAPL.Close"],
            customdata=df_apple_stock,
            mode="lines",  # lines+markers
            name="AAPL.Close",
            # marker=dict(
            #     size=5,
            #     line=dict(width=2, color='DarkSlateGrey')
            # ),
            # hoveron='points',
            hovertemplate="<b>%{x}</b><br>"
            + "open: %{customdata[1]:$.2f} <br>"
            + "close: %{y:$.2f} <br>"
            + "high: %{customdata[2]:$.2f} <br>"
            + "low: %{customdata[3]:$.2f} <br>"
            + "volume: %{customdata[5]:,}"
            # '<extra>test</extra>'
        )
    ]
)

fige = px.scatter(
    df_earthquakes,
    x="Date",
    y="Magnitude",
    color="Magnitude",
    color_continuous_scale="reds",
)

fig2 = make_subplots(specs=[[{"secondary_y": True}]])

fig2.add_trace(fig.data[0])
fig2.add_trace(fige.data[0], secondary_y=True)

fig2.update_layout(coloraxis=fige.layout.coloraxis).update_layout(coloraxis={"colorbar":{"y":.4}})

alternative for earthquakes

inspired by @vestland answer
earthquake data can be summarised first with pandas, frequency is not daily so aggregate to daily
also have filtered / removed days that have fewer that 3 earthquakes
more on color and on size

fige = px.scatter(
    df_earthquakes.groupby(df_earthquakes["Date"].dt.date).agg(
    Magnitude=("Magnitude", "max"), Count=("Date", "count")
).reset_index().loc[lambda d: d["Count"].gt(3)],
    x="Date",
    y="Magnitude",
    color="Magnitude",
    size="Count",
    color_continuous_scale="rdylgn_r",
)