Pandas filtering columns with timestamps that are aligned every query/function defined minutes

Question

I am plotting stock OHLC data to mplfinance and customising the x-axis amd to do so I pass an array of indicies from my dataframe, this however is question to refactor my code and use pandas efficiently.

I loop through the data and every 15 minute aligned timestamp (i.e. on-the-hour or 15/30/45 minutes-past) I would want to print a label/tick.

Instead of using external arrays (tickLocations and tickLabels arrays in my code below), I would look to add an additional column from this data called XAxisTick and set True where a tick is required (please scroll right to show the added column):

Timestamp                   Volume  VW      Open    Close   High    Low     Time            Num     XAxisTick       
2021-12-21 05:00:00-05:00   1370.0  22.5    22.5    22.5    22.5    22.5    1640080800000   6       True
2021-12-21 05:15:00-05:00   2790.0  22.3211 22.32   22.32   22.33   22.32   1640081700000   11      True
2021-12-21 05:30:00-05:00   3000.0  22.231  22.27   22.22   22.27   22.22   1640082600000   11      True
2021-12-21 05:35:00-05:00   202.0   22.1713 22.16   22.16   22.16   22.16   1640082900000   4       False
2021-12-21 06:00:00-05:00   125.0   22.17   22.17   22.17   22.17   22.17   1640084400000   1       True
2021-12-21 06:05:00-05:00   10000.0 22.2    22.2    22.2    22.2    22.2    1640084700000   17      False
2021-12-21 06:10:00-05:00   8969.0  22.1004 22.12   22.1    22.12   22.1    1640085000000   22      False
2021-12-21 06:15:00-05:00   4409.0  22.0664 22.1    22.06   22.1    22.06   1640085300000   7       True
2021-12-21 06:20:00-05:00   8300.0  22.0437 22.06   22.01   22.06   22.01   1640085600000   10      False
2021-12-21 06:25:00-05:00   13865.0 21.9568 22.06   21.91   22.06   21.91   1640085900000   31      False
2021-12-21 06:35:00-05:00   4000.0  21.9555 21.94   21.96   21.96   21.94   1640086500000   9       False
2021-12-21 06:40:00-05:00   2594.0  21.92   21.92   21.92   21.92   21.92   1640086800000   4       False
2021-12-21 06:45:00-05:00   751.0   21.9513 21.97   21.91   21.97   21.91   1640087100000   5       True
2021-12-21 06:50:00-05:00   306.0   21.9325 21.9    21.95   21.95   21.9    1640087400000   4       False
2021-12-21 06:55:00-05:00   300.0   21.9333 21.96   21.92   21.96   21.92   1640087700000   2       False
2021-12-21 07:00:00-05:00   13979.0 21.9129 21.96   21.93   21.96   21.9    1640088000000   60      True

Additionally I would like to have this adjustable for different time spans, i.e. data that's normally spaced every 15 minutes to have ticks every 60 minutes, or hourly data to have ticks every 4 hours (but keeping with the above, only if a data point exists there for the label). Please note i am not talking about resampling that dataframe will always be and plotted as-is, I am looking only to define where ticks are created via the XAxisTick boolean depending on the tick generation strategy.

Once I get the above, I believe Pandas would then enable me to do more advanced filtering, like ignoring labels in plot image below at 05:00/05:15/05:30 squished on the plot due to minimal spacing, in that case would want at least 1 non-label row with XAxisTick=false prior - thus dropping the label at 05:15 but leaving 05:00 and 05:30.

This label filtering is what set my off in this jouney, however my pandas skills are lacking - thanks in advance for the time and assistance.

import mplfinance as mpf
import pandas as pd
import datetime

def getTimestampTickFrequency(df):
    # get most common interval in minutes
    mode = df.index.to_series().diff().astype('timedelta64[m]').astype('Int64').mode()[0]
    
    if mode==5:
        return 15 # for 5 minutes, tick every 15 mins
    elif mode==15:
        return 60 # for 15 minute data, tick every hour
    elif mode==30:
        return 120 # for 30 minute data, tick every 2 hours
    
    return mode

def getTickLocationsAndLabels(df):
    tickLocations = []
    tickLabels = []
    
    tickFrequencyMinutes = getTimestampTickFrequency(df)
    entireTickRange = pd.date_range(start=df.index[0], end=df.index[-1], freq=f'{tickFrequencyMinutes}T')
    
    # get indexes of data frame that match the ticks, if they exist
    for tick in entireTickRange:
        print(tick)
        try:
            found = df.index.get_loc(tick)            
            tickLocations.append(found)
            tickLabels.append(tick.time().strftime('%H:%M'))
            
        except KeyError:            
            pass # ignore
            
    return tickLocations, tickLabels

df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
tickLocations, tickLabels = getTickLocationsAndLabels(df5trimmed)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(tickLocations)
axlist[-2].set_xticklabels(tickLabels)

# Display:
mpf.show()

mplfinance does not yet support user control over tick locations, so presently what you are doing is the way to do it. I *am* working on an enhancment to allow user control over tick locations, and this question is a great use-case to consider. It's not entirely clear to me what you are asking here. Possibly Pandas **`resample().agg()`** may help. There is an example here: https://github.com/matplotlib/mplfinance/blob/master/examples/scratch_pad/pandas_resample_test.ipynb and more documentation here https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html Lmk. — Daniel Goldfarb, Jan 01 '22 at 23:49
There is another example of resampling [here](https://stackoverflow.com/a/66053681/1639359), but with shorter time periods (more similar to what you are using). You mentioned **filtering** but I am thinking perhaps **resampling** will help you accomplish what you want. Let me know if that's not the case. — Daniel Goldfarb, Jan 01 '22 at 23:54
Hi Daniel, thanks for the time to reply and help so far. Its not a sampling issue, its more to push my knowledge of pandas - by adding an additional column XAxisTick on the dataframe, which would be a marker boolean to determine if we will show a tick for that row/candle. Keeping this all in Pandas would allow for additional queries to be ran when getting the tick locations (as opposed to keeping tick locations in a seperate/external array outside of the dataframe). I have updated the question to give some clarity. Thanks again for the help. — morleyc, Jan 02 '22 at 11:15

score 0 · Answer 1 · answered Jan 02 '22 at 14:34

If all you want to do is add your new data as new columns to your dataframe, the simplest way to add a new column to a dataframe is to simply reference the new column in an assignment:

For example, something like this:

df['XAxisTick'] = tickLocations

To use this method, len(tickLocations) must be == len(df)

Therefore if you have any gaps in tickLocations you should fill them with either None or float('nan') or some other value that you choose to indicate no data for that location. This can easily be done in the loop where you are creating the tickLocations list, with an appropriate if and assignment of None or whatever other "missing data" value you choose.

Note, in the above assignment to the dataframe, tickLocations may be an list, tuple, ndarray, or pd.Series, as long as it is the correct length.

There are other methods for adding a column to a dataframe. Above is the simplest method. Not 100% sure if this was the only question you were asking, but if so you could have found the answer here. Lmk.

Pandas filtering columns with timestamps that are aligned every query/function defined minutes

1 Answers1