0

I have a feeling there is a very simple way of doing this. I'm trying to plot a timeline of a tasks running on an an environment, incl. two plots on the same diagram:

  1. the task run-times as a broken_barh
  2. an overall load curve based on the aggregate of tasks on each time-point (or a histogram), let's say with lower opacity or a line.

In the example there were 6 tasks running (A-F), for various lengths, with different start times. They are plotted exactly as I need (1/), in a gant-like chart, time on the X axis.

import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib as mpl
from matplotlib import pyplot as plt

cols=['ID','From','To']

df = pd.DataFrame([['A', 736758.993, 736758.995], ['B', 736758.995, 736758.998],
                   ['C', 736758.994, 736758.996], ['D', 736758.996, 736758.997],
                   ['E', 736758.996, 736758.997], ['F', 736758.995, 736758.996]],
                   columns=cols)

df['Diff'] = df['To']-df['From']

fig,ax=plt.subplots()
for i, slice in df.iterrows():
    values = [[slice['From'], slice['Diff']]]
    ax.broken_barh((values), (i-0.4,0.8), color=np.random.rand(3))

ax.xaxis_date()

To this I would like to add 2/ a curve, showing the active task count at each time (1 between 23:51-23:52, 2 for 23:52-53 etc., peaking around 23:54)

The problem with this is that I cannot just draw a histogram of the start times, since the different task overlap in time. Do you know a decent way to create such histogram?

Mr. T
  • 11,960
  • 10
  • 32
  • 54
parszab
  • 149
  • 7
  • Don't you want to group your data first by ID or do you aim for a [Gantt chart](https://stackoverflow.com/a/18072543/8881141)? – Mr. T Mar 09 '18 at 10:06
  • I think this question can be improved a lot. Currently the code shows a Gantt chart type of plot, while this is **not at all** what your asking about here. I would remove all about the broken bar stuff because that is what you already know how to do and will not help at all with the envisionned task. – ImportanceOfBeingErnest Mar 09 '18 at 10:34
  • Sorry, there was a typo, the IDs are unique. I did some edits. And I tried to clarify. I need two plots on the same diagram, a gantt-like chart + a curve showing the load. I hope this clarifies. – parszab Mar 09 '18 at 13:34

1 Answers1

1

I am pretty sure there are cleaner ways to approach this. Especially the float math problems were pretty annoying, when trying to create the histogram. The first part is a simple one liner, though. Just use, as suggested, hlines and increase the linewidth to create your bar chart.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

df = pd.DataFrame([['A', 736758.993, 736758.995], ['B', 736758.995, 736758.998],
                   ['C', 736758.994, 736758.996], ['D', 736758.994, 736758.997],
                   ['E', 736758.997, 736758.998], ['F', 736758.995, 736758.999]],
                   columns = ['ID','From','To'])

#create two subplots with shared x axis
fig, (ax1, ax2) = plt.subplots(2, 1, sharex = True)
#plot1 - Gantt chart for individual IDs
ax1.hlines(df.ID, df.From, df.To, colors = cm.inferno(df.index/len(df)), linewidth = 20)

#plot 2 - make table of time series for each ID - multiply by 1000 to avoid float problems
hist_count = df.apply(lambda row: pd.Series(np.arange(1000 * row["From"], 1000 * row["To"])), axis = 1)
hist_count = pd.melt(hist_count)["value"].dropna().astype(int)
#find borders for bins 
min_time = hist_count.min(axis = 0)
max_time = hist_count.max(axis = 0)
#plot 2 histogram - add 0.0001 to prevent arbitrary binning due to float problems
ax2.hist(hist_count / 1000 + 0.0001, range = (min_time / 1000, (max_time + 1) / 1000), bins = max_time - min_time + 1)
ax2.xaxis_date()

plt.show()

Output from sample data set: enter image description here

Mr. T
  • 11,960
  • 10
  • 32
  • 54