1

I want to plot an infinite non ending line between two points that are in the form of a pandas series. I am able to successfully plot a standard line between the points, however I don't want the line to "end" and instead it should continue. Expanding on this I would also like to extract the values of this new infinite line to a new dataframe so that I can see what corresponding line value a given x value in has.

data = yf.download("AAPL", start="2021-01-01", interval = "1d").drop(columns=['Adj Close'])
data = data[30:].rename(columns={"Open": "open", "High": "high", "Low": "low", "Close": "close", "Volume": "volume"})
local_max = argrelextrema(data['high'].values, np.greater)[0]
local_min = argrelextrema(data['low'].values, np.less)[0]
highs = data.iloc[local_max,:]
lows = data.iloc[local_min,:]

highesttwo = highs["high"].nlargest(2)
lowesttwo = lows["low"].nsmallest(2)

fig = plt.figure(figsize=[10,7])
data['high'].plot(marker='o', markevery=local_max)
data['low'].plot(marker='o', markevery=local_min)
highesttwo.plot()
lowesttwo.plot()
plt.show()

Currently my plot looks like this:

Current result

How ever I want it to look like this as well as be able to get the values of the line for the corresponding x value. enter image description here

sword134
  • 91
  • 1
  • 11

1 Answers1

1

This can be done in a few steps as shown in the following example where the lines are computed with element-wise operations (i.e. vectorized) using the slope-intercept form of the line equation.

The stock data has a frequency based on the opening dates of the stock exchange. This frequency is not automatically recognized by pandas, therefore the .plot method produces a plot with a continuous date for the x-axis and includes the days with no data. This can be avoided by setting the argument use_index=False so that the x-axis uses integers starting from zero instead.

The challenge is to then create nicely formatted tick labels. The following example attempts to imitate the pandas tick format by using list comprehensions to select the tick locations and format the labels. These will need to be adjusted if the date range is significantly lengthened or shortened.

import numpy as np                      # v 1.19.2
import pandas as pd                     # v 1.2.3
import matplotlib.pyplot as plt         # v 3.3.4
from scipy.signal import argrelextrema  # v 1.6.1
import yfinance as yf                   # v 0.1.54

# Import data
data = (yf.download('AAPL', start='2021-01-04', end='2021-03-15', interval='1d')
         .drop(columns=['Adj Close']))
data = data.rename(columns={'Open': 'open', 'High': 'high', 'Low': 'low',
                            'Close': 'close', 'Volume': 'volume'})

# Extract points and get appropriate x values for the points by using
# reset_index for highs/lows
local_max = argrelextrema(data['high'].values, np.greater)[0]
local_min = argrelextrema(data['low'].values, np.less)[0]
highs = data.reset_index().iloc[local_max, :]
lows = data.reset_index().iloc[local_min, :]
htwo = highs['high'].nlargest(2).sort_index()
ltwo = lows['low'].nsmallest(2).sort_index()

# Compute slope and y-intercept for each line
slope_high, intercept_high = np.polyfit(htwo.index, htwo, 1)
slope_low, intercept_low = np.polyfit(ltwo.index, ltwo, 1)

# Create dataframe for each line by using reindexed htwo and ltwo so that the
# index extends to the end of the dataset and serves as the x variable then
# compute y values
# High
line_high = htwo.reindex(range(htwo.index[0], len(data))).reset_index()
line_high.columns = ['x', 'y']
line_high['y'] = slope_high*line_high['x'] + intercept_high
# Low
line_low = ltwo.reindex(range(ltwo.index[0], len(data))).reset_index()
line_low.columns = ['x', 'y']
line_low['y'] = slope_low*line_low['x'] + intercept_low

# Plot data using pandas plotting function and add lines with matplotlib function
fig = plt.figure(figsize=[10,6])
ax = data['high'].plot(marker='o', markevery=local_max, use_index=False)
data['low'].plot(marker='o', markevery=local_min, use_index=False)
ax.plot(line_high['x'], line_high['y'])
ax.plot(line_low['x'], line_low['y'])
ax.set_xlim(0, len(data)-1)

# Set major and minor tick locations
tks_maj = [idx for idx, timestamp in enumerate(data.index)
           if (timestamp.month != data.index[idx-1].month) | (idx == 0)]
tks_min = range(len(data))
ax.set_xticks(tks_maj)
ax.set_xticks(tks_min, minor=True)

# Format major and minor tick labels
labels_maj = [ts.strftime('\n%b\n%Y') if (data.index[tks_maj[idx]].year
              != data.index[tks_maj[idx-1]].year) | (idx == 0)
              else ts.strftime('\n%b') for idx, ts in enumerate(data.index[tks_maj])]
labels_min = [ts.strftime('%d') if (idx+3)%5 == 0 else ''
              for idx, ts in enumerate(data.index[tks_min])]
ax.set_xticklabels(labels_maj)
ax.set_xticklabels(labels_min, minor=True)

plt.show()

infinite_lines



You can find more examples of tick formatting here and here in Solution 1.

Date string format codes

Patrick FitzGerald
  • 3,280
  • 2
  • 18
  • 30
  • This seems to work, however there is a lot of cases where the result will look like this: https://imgur.com/73HPGhf – sword134 Mar 11 '21 at 20:59
  • so what exactly is the solution for this? How do I not plot via matplotlib and still preserve a proper line as well as making sure that the dataframe containing the plotting points is actually correct? – sword134 Mar 12 '21 at 11:37
  • Im using an exact copy paste of the code. I've put it in this pastebin, I am still getting squiggly lines. Simply change the tail(number) on line 15 to test the code with different data input. https://pastebin.com/FsjMevxR – sword134 Mar 12 '21 at 15:27
  • @sword134 Thank you for sharing your code. I see that my mistake was to not test my answer with yf data as I avoid extra package imports to make the code in my answers more sustainable (there have been issues with yf in past). My apologies for this. The issue is that pandas tries to infer the frequency of the datetime index. In your question, the sample `data[30:]` happens to be short enough that it contains only calendar days that are recognized by pandas as corresponding to a 'business day frequency' (as used in my answer). – Patrick FitzGerald Mar 12 '21 at 23:03
  • So pandas plots the data with that frequency, plotting only dates contained in the datetime index and formatting the labels nicely. Now if you change the sample to `data[29:]`, you will see that the labels look different (matplotlib style) and that the line is longer because it now contains weekends. Why? Because Feb 15 is a holiday for the NYSE so that date is not included in the table. This causes pandas to not recognize the frequency of the data anymore so it uses instead matplotlib defaults to process the datetime index for the x-axis. – Patrick FitzGerald Mar 12 '21 at 23:03
  • Now my question to you is what would you prefer as a working solution? 1. Plot all the days of the year and have straight lines that have data points on weekends and holidays even if there is no market data for those days (if that makes sense for the purpose for which you need those lines). 2. Plot only market days, which I believe makes more sense but the only downside is that the labels need to be customized and the code for that needs to be adjusted if the time range is significantly lengthened/shortened. 3. Have both solutions. – Patrick FitzGerald Mar 12 '21 at 23:04
  • Sorry about not responding earlier, its been a busy weekend. I am aiming for this to be a screener of sorts, so it obviously is going to run through around 500 or so stock symbols which means I cant correct for each of them individually. I think only plotting market days makes more sense, isn't there just a way to turn the datetime index into a numerical one and fix it that way? – sword134 Mar 15 '21 at 10:29
  • @sword134 I have edited my answer to replace the date units on the x-axis with a range of integers starting from zero. I also noticed that I had forgotten to sort `htwo` and `ltwo` by index. Now the lines should be displayed correctly in all cases. Let me know if this is not the case or if anything is unclear. – Patrick FitzGerald Mar 16 '21 at 14:04
  • FitzGerald. Thank you so much! This seems to work flawlessly so far. Again thanks a ton, I've been scratching my head a fair bit over this little problem. – sword134 Mar 16 '21 at 17:49
  • @sword134 Happy to have been of help! – Patrick FitzGerald Mar 16 '21 at 18:52