0

I'm trying to create a graph of dashed lines that represent the length of an event for each of my hashes. my dataframe is as follows:

                            hash    event        start          end
0174FAA018E7FAE1E84469ADC34EF666 baseball 00:00:00:000 00:00:00:500
0174FAA018E7FAE1E84469ADC34EF666 baseball 00:00:01:000 00:00:01:500
0174FAA018E7FAE1E84469ADC34EF666 cat      00:00:01:500 00:00:02:500
AF4BB75F98579B8C9F95EABEC1BDD988 baseball 00:00:01:000 00:00:01:500
AF4BB75F98579B8C9F95EABEC1BDD988 cat      00:00:01:500 00:00:02:500
AF4BB75F98579B8C9F95EABEC1BDD988 cat      00:00:03:200 00:00:05:250
AF4BB75F98579B8C9F95EABEC1BDD988 cat      00:00:03:000 00:00:04:350

something similar to the answer here: Change spacing of dashes in dashed line in matplotlib where the hashes are on the y-axis and there are time intervals on the x-axis with event types color coded and broken up by blank space if there is no event for that time interval.

this is what I've tried so far but it's not working:

fig,ax = plt.subplots()
ax.plot([0, df.end], [df.hash], linestyle='--', dashes=(5, 5)) 

see below for example

garbage hand-drawn graph

adam
  • 940
  • 5
  • 13
  • 30
  • hmm, could you upload maybe a hand drawn chart that you want to build using matplotlib – Haleemur Ali Sep 07 '18 at 16:02
  • a line from point (x1,y1) to (x2,y2) in matplotlib is defined as `plt.plot([x1,x2],[y1,y2])`. – ImportanceOfBeingErnest Sep 07 '18 at 16:34
  • @ImportanceOfBeingErnest that's helpful and seems like its going in the right direction. how would you automatically compute those for every start/stop for each event for each hash? – adam Sep 07 '18 at 17:01
  • I think they are already there in the dataframe. `df["start"][4]` is the fifth number in the start column. You would however need to convert the `00:00:01:500` to some number that can be plotted. – ImportanceOfBeingErnest Sep 07 '18 at 17:04
  • @HaleemurAli I edited the question - check the image – adam Sep 07 '18 at 17:10

1 Answers1

1

First I'd like to say: my first association with your request was matplotlib's broken_barh function. But up to now I could not figure out how to plot timedeltas, as this would be necessary there. Your plot can be done with plot too, so I have some code with an if False: (attempt with plt.broken_barh) else (plt.plot-version) structure. See yourself.
I'll try to update the literally broken part, as soon as I have an idea how to plot timedeltas in matplotlib...

Here's the code I hope that can help you:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from io import StringIO


def brk_str(s):       # just for nicer labeling with such long hashes
    return '\n'.join([s[8*i:8*(i+1)] for i in range(4)])


s = '''                            hash    event        start          end
0174FAA018E7FAE1E84469ADC34EF666 baseball 00:00:00:000 00:00:00:500
0174FAA018E7FAE1E84469ADC34EF666 baseball 00:00:01:000 00:00:01:500
0174FAA018E7FAE1E84469ADC34EF666 cat      00:00:01:500 00:00:02:500
AF4BB75F98579B8C9F95EABEC1BDD988 baseball 00:00:01:000 00:00:01:500
AF4BB75F98579B8C9F95EABEC1BDD988 cat      00:00:01:500 00:00:02:500
AF4BB75F98579B8C9F95EABEC1BDD988 cat      00:00:03:200 00:00:05:250
AF4BB75F98579B8C9F95EABEC1BDD988 cat      00:00:03:000 00:00:04:350'''

df = pd.read_table(StringIO(s), sep='\s+')

df['start'] = pd.to_datetime(df['start'], format='%H:%M:%S:%f')
df['end'] = pd.to_datetime(df['end'], format='%H:%M:%S:%f')

df['dur'] = (df['end'] - df['start'])   # this is only needed in case of broken_barh would work...

e_grpd = df.groupby('event')

fig, ax = plt.subplots()

for i, (e, ev) in enumerate(e_grpd):   # iterate over all events, providing a counter i, the name of every event e and its data ev
    last_color = None    # setting color value to None which means automatically cycle to another color
    for k, (h, hv)in enumerate(ev.groupby('hash')):   # iterate over all hashes, providing a counter k, every hash h and its data hv
        if False:   # desperately not deleting this as broken_barh would save the innermost loop and would generally fit better I think...
            pass
            #ax.broken_barh(ev[['start', 'dur']].T, np.array([i*np.ones(len(ev))+k/10, .1*np.ones(len(ev))]).T)
        else:
            for n, (a, b) in enumerate(zip(hv.start, hv.end)):   # iterate over every single event per hash, providing a counter n and start and stop time a and b
                p = ax.plot([a, b], k*np.ones(2)+i/10, color=last_color, lw=15, label='_' if k>0 or n>0 else '' + e)
                last_color = p[0].get_c()    # setting color value to the last one used to prevent color cycling


ax.set_yticks(range(len(df.groupby('hash').groups)))
ax.set_yticklabels(map(brk_str, df.groupby('hash').groups))
ax.legend(ncol=2, bbox_to_anchor=[0, 0, 1, 1.1], loc=9, edgecolor='w')
plt.tight_layout()

Result with plt.plot:

enter image description here

SpghttCd
  • 10,510
  • 2
  • 20
  • 25
  • I think this is the right direction. I currently can only get a single hash to plot and then a list index out of range for the line `ax.plot([a, b], i*np.ones(2)+k/10, clrs[k], lw=15, label='_' if i>0 or n>0 else '' + e)` and I can't figure out what is causing the issue. Would you mind commenting on your code above as to what each line is referencing? I can mostly follow along but get a little lost inside the second `for` loop. – adam Sep 10 '18 at 13:50
  • 1
    The only list in there is `clrs`, which has only two entries because your example doesn't have more. So I assume you now test the code with more than two events? For testing purposes you can delete the `clrs`-kwarg completely, but for the plot to look correct, `clrs` needs to have as many color entries as you have events in your data. - And to your request: I'll add some more comments as soon as I have some time... – SpghttCd Sep 10 '18 at 14:28
  • Is there a way to change the clrs list to a colormap, similar to: http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html#colormaps – adam Sep 10 '18 at 14:42
  • 1
    Of course, it is. In the meantime I think it would be better to swap the grouping, i.e. group by event first and then by hash. This results in an automatically applied color cycle (at the first hash, which could then be reused for all others) so you don't have to worry about creating something beforehand. I'll update the code in that direction, if that's ok for you... supplemental: `clrs` purpose was initially only to meet your hand drawn colors. – SpghttCd Sep 10 '18 at 15:05
  • I tried grouping by event and then hash but that made the y-axis the events rather than the hashes. Otherwise, removing the `clrs` list worked. Now I just need to figure out how best to get the display to work - I'm trying to rotate the timestamps 45 just to make it readable. – adam Sep 10 '18 at 16:01
  • I'm getting the opposite results for some reason when I run the update, where the y_axis is the two labels and the legend has only two of my hashes. I think there's something wrong with the groupbys. – adam Sep 10 '18 at 18:27
  • Most recent update does the trick! thanks again for all the help. – adam Sep 10 '18 at 18:38