4

I would like to take the best of this and this questions. Namely, I have a DataFrame that contains the test name, date of execution and outcome. And I want to showcase how the percentage of failed cases decreased over time.

My data looks like this:

TestName;Date;IsPassed
test1;12/8/2016 9:44:30 PM;0
test1;12/8/2016 9:39:00 PM;0
test1;12/8/2016 9:38:29 PM;1
test1;12/8/2016 9:38:27 PM;1
test2;12/8/2016 5:05:02 AM;1
test3;12/7/2016 8:58:36 PM;0
test3;12/7/2016 8:57:19 PM;1
test3;12/7/2016 8:56:15 PM;1
test4;12/5/2016 6:50:49 PM;0
test4;12/5/2016 6:49:50 PM;0
test4;12/5/2016 3:23:09 AM;1
test4;12/4/2016 11:51:29 PM;1

And I was using this code to plot the cases separately:

fig, ax = plt.subplots()
passed = tests[tests.IsPassed == 1]
failed = tests[tests.IsPassed == 0]
passed_dates = mdates.date2num(passed.Date.astype(datetime))
failed_dates = mdates.date2num(failed.Date.astype(datetime))
ax.hist(passed_dates, bins=10, color='g')
ax.hist(failed_dates, bins=10, color='r')
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d.%m.%y'))
plt.show()

But now I would like to

  1. Divide the time span into a configurable amount of buckets
  2. Count the amount of test runs per bucket (without for loops, as there is a lot of entries in the dataframe)
  3. Plot either a 100% area chart or the stacked histogram per each bucket, so that the amount from step 2 is 100%

The problem for me right now is that the perfectly working solution with the hist() takes care of summing up automatically, and I don't see a way to pass the Y axis to it.

Update

Here is what I'd like to accomplish (taken from another source): 100% Stacked columns

Community
  • 1
  • 1
Nick Slavsky
  • 1,300
  • 3
  • 19
  • 39

1 Answers1

1

Using the argument stacked = True allows you to provide several arrays as input to plt.hist.

ax.hist([passed_dates, failed_dates], bins=10, stacked=True, label=["passed", "failed"])

enter image description here

Using relative counts requires to divide by the number of absolute counts per bin. This functionality is not built into the hist function. You would need to calculate the histograms manually and then plot the result as stacked bar plots.

from __future__ import division
import matplotlib.pyplot as plt
import matplotlib.dates
import datetime
import numpy as np
import pandas as pd

dates = pd.date_range("2016/01/01","2016/06/01" )
dates2 = pd.date_range("2016/02/01","2016/03/17", freq="18H")
dates = dates.append(dates2)

passed = np.round(np.random.rand(len(dates))+0.231).astype(np.int8)
tests = pd.DataFrame({"Date" : dates, "IsPassed": passed})

fig, ax = plt.subplots()
passed = tests[tests.IsPassed == 1]
failed = tests[tests.IsPassed == 0]
all_dates = matplotlib.dates.date2num(tests.Date.astype(datetime.datetime))
passed_dates = matplotlib.dates.date2num(passed.Date.astype(datetime.datetime))
failed_dates = matplotlib.dates.date2num(failed.Date.astype(datetime.datetime))

hist, bins = np.histogram(all_dates, bins=10)
histpassed, bins_ = np.histogram(passed_dates, bins=bins)
histfailed, bins__ = np.histogram(failed_dates, bins=bins)

binwidth=bins[1]-bins[0]
ax.bar(bins[:-1]+binwidth/2., histpassed/hist, width=binwidth*0.8, label="passed")
ax.bar(bins[:-1]+binwidth/2., histfailed/hist, width=binwidth*0.8, bottom=histpassed/hist, label="failed")

ax.xaxis.set_major_locator(matplotlib.dates.AutoDateLocator())
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%d.%m.%y'))
ax.legend()
fig.autofmt_xdate()
plt.savefig(__file__+".png")
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Nice! Thanks for the hint. Now is there a way to switch from the absolute count on Y axis to percentage? Right now some bins sum up to over a thousand runs, while others are less than 100... – Nick Slavsky Feb 14 '17 at 08:30
  • See edited answer. If this still does not help, feel free to ask further. – ImportanceOfBeingErnest Feb 14 '17 at 09:42
  • This is exactly what I've been looking for! Thanks, @ImportanceOfBeingErnest! There are a couple of things that would like to change to adapt to my needs, but those are off the topic of this question. – Nick Slavsky Feb 14 '17 at 10:46