0

I want to plot a histogram of financial data. Specifically, I want to plot a histogram of index returns, where the return is on the x axes and the number count on the y-axes.

I used the following Code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

returns = (0.08,0.05,-0.15,0.12,-0.18,0.02,0.25)
years = (2010,2011,2012,2013,2014,2015,2016)
data = pd.DataFrame(returns, columns = ["Return"], index = years)
bins = [-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3, 0.4, 0.5]

plt.hist(data, bins = bins, edgecolor = "white")

enter image description here

This codes yields the following plot. When I now look at the data for returns between -20% and - 10% there are 2 years that fall within this range (2012 and 2014). What I want to do now is basically that the bin that is containing these 2 observations between -20% and -10% is seperated into 2 bins which are stacked on top of each other but that the year of the observation is displayed inside the bins. So that I can see the two corresponding years for the bin when lookin at the plot. But I cannot find any hint how to do that in the documentation.

  • 1
    Does this answer your question? [Matplotlib - label each bin](https://stackoverflow.com/questions/6352740/matplotlib-label-each-bin) – SoakingHummer Apr 29 '21 at 07:54
  • Please make (e.g. in paint) some simple expected result plot. It would be much easier to understand what you want to achieve – dankal444 Apr 29 '21 at 08:05
  • @dankal444 the plot below is the result that I want to achieve. But there is the issue that a couple of boxes are not in the correct range where they belong – Kyle_Stockton Apr 29 '21 at 15:47

1 Answers1

1

You can use np.digitize to tell to which bin every value belongs. Then you can iterate through the rows of the dataframe and plot each year at the desired location. An array with bottom positions needs to be updated at each step.

As np.digitize starts counting the bins from 1 and uses 0 for values that are too small, 1 needs to be subtracted to serve as in index into the arrays. Further, values larger than the largest will get a bin index that is too high, so they also can be filtered away.

import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import pandas as pd
import numpy as np

returns = (0.08, 0.05, -0.15, 0.12, -0.18, 0.02, 0.25)
years = (2010, 2011, 2012, 2013, 2014, 2015, 2016)
data = pd.DataFrame(returns, columns=['Return'], index=years)
bins = [-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3, 0.4, 0.5]
bin_width = np.diff(bins)
data['Bin_ind'] = np.digitize(data['Return'], bins) - 1 # np.digitize uses 0 for values lower than the smallest bin edge

fig, ax = plt.subplots()
bottoms = np.zeros(len(bins))
for year, bin_ind in data['Bin_ind'].iteritems():
    if bin_ind >= 0 and bin_ind < len(bins) - 1:
        ax.bar(x=bins[bin_ind], height=1, bottom=bottoms[bin_ind], width=bin_width[bin_ind], align='edge',
               facecolor='turquoise', alpha=0.6, edgecolor='black')
        ax.text(bins[bin_ind] + bin_width[bin_ind] / 2, bottoms[bin_ind] + 0.5, year, ha='center', va='center')
        bottoms[bin_ind] += 1
ax.set_xticks(bins)
ax.yaxis.set_major_locator(MultipleLocator(1))
plt.show()

histogram with individual values

The approach can be extended to color code each rectangle depending on the return value, for example:

cmap = plt.cm.get_cmap('coolwarm_r')
norm = plt.Normalize(-0.5, 0.5)
fig, ax = plt.subplots()
bottoms = np.zeros(len(bins))
for year, bin_ind in data['Bin_ind'].iteritems():
    ax.bar(x=bins[bin_ind], height=1, bottom=bottoms[bin_ind], width=bin_width[bin_ind], align='edge',
           facecolor=cmap(norm(data.loc[year, 'Return'])), edgecolor='black')
    ax.text(bins[bin_ind] + bin_width[bin_ind] / 2, bottoms[bin_ind] + 0.5, year, ha='center', va='center')
    bottoms[bin_ind] += 1

color coded bins

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • This is so close to what I expected! The only issue thats left is that the boxes are one bin to far to the right. For example, 2011 and 2015 have returns of 0.05 and 0.02 but the box in the plot is between 0.1 and 0.2. So they are to far to the right and I dont know how to fix the issue – Kyle_Stockton Apr 29 '21 at 15:45
  • So I basically need to shift the entire plot one box to the right – Kyle_Stockton Apr 29 '21 at 15:48
  • Thank you for checking this out and letting me know. I forgot `np.digitize` starts numbering the bins at 1. I updated the post accordingly. – JohanC Apr 29 '21 at 16:55