4

I am trying to finish a task for a project and my task is to create a histogram of yearly returns of Dow Jones historical returns. I have uploaded a picture of the task and my progress below. The problem I have at this point is that I can't find a way to separate the years in the histogram as it shows in the task and I don't know how to modify the y-axix and the legend to show the information that is showing in the first picture.

Any help is appreciated

What I am trying to make and My progress so far

Here is my code:

# Importing packages
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

#setting the order
order=[-60,-50,-40,-30,-20,-10,
0,10,20,30,40,50,60,70]

#getting the data
dow_jones_returns = pd.read_csv('data/dow-jones-by-year-historical-annual-returns (2).csv')

dow_jones=pd.DataFrame(data=dow_jones_returns)

dow_jones['date']=pd.to_datetime(dow_jones['date'])

dow_jones['date']=pd.DatetimeIndex(dow_jones['date']).year

pd.to_numeric(dow_jones.value)

up_to_2019=dow_jones.iloc[0:99]

lastyear= dow_jones.iloc[-1]

#ploting the histogram
fig = plt.figure()

up_to_2019['value'].plot.hist(bins = order)    
plt.show()
Tyberius
  • 625
  • 2
  • 12
  • 20
  • Hi, welcome to SO.Currrently it's hard to help you with out any data. Please provide an [mre]. Also you will achieve better answers when you ask more specific question. – Björn Apr 17 '20 at 13:48
  • Thank you for your comment Björn. I understand that it is hard to edit my code without the data and I apologise if I have overcomplicated my code. However, I was hoping that I could get a lead on how can I stack the years on my histogram as it is in the first picture or how to stretch the y-axis in order to include the percentages. Anyways, anything that can help me at this point is perfect for me. Thank you! – hajredinpasha Apr 17 '20 at 13:52

1 Answers1

2

Hi to just give you some further directions,

Regarding the Textbox
the textbox looks like it contains the summary statistics of DataFrame.describe() + a few additional ones. You can create a textbox by utilzing a combination of .text() and .subplot() I found this guide to be very useful for creating a textbox in a plot

Since we dont have the data, here a pseudo code:

import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
textstr = str(up_to_2019['value'].describe())

ax.hist(up_to_2019['value'], bins = order)   

# these are matplotlib.patch.Patch properties
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

# place a text box in upper left in axes coords
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=10,
        verticalalignment='top', bbox=props)

plt.show()

Regarding the y-axis:
1) Here is how you set the right label: plt.ylabel("Number of Observations\n(Probability in%)")
2) Than add the Ticks plt.yticks(np.arange(1,27))

Regarding the labels inside the bins
Thats rather tricky, one option, though definitely not advised would to also include the labels via the .text() method. I dont know if it helps but here is how you do this in R.
Also might helpful are these two links:

Apparently calling plt.hist() has three return values one of which is callled patches. You can iterate over patches and i.e. change the color of these (see the link above) however I couldn't figure how to put a text to them.

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt   
x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,100]
num_bins = 5
n, bins, patches = plt.hist(x, num_bins, facecolor='blue', alpha=0.5)
for i,pat in enumerate(patches):
    pat.set_test("Test") #this doesnt work sadly
Björn
  • 1,610
  • 2
  • 17
  • 37
  • 1
    Thank you very much for your help Björn. Your points made me progress a great deal. I hope that this code can help and give a lead to many other people who are trying to solve the same issues :) – hajredinpasha Apr 17 '20 at 15:04
  • Glad I could help :) – Björn Apr 17 '20 at 15:07
  • @hajredinpasha how did you go. If not resolved, could you update the question so that we try to help without reinventing the wheel? – wwnde Apr 18 '20 at 06:00
  • @wwnde thank you for offering your help and for asking an update on this project. In the end, I managed with some help from my friends to resolve the issue. The way how the years can be separated is by creating a stacked bar chart ( I have a histogram in my code so that would be the wrong way). As for the separation itself, there are two for loops, one going through the data to separate it into the specified categories in the x axis and the other one to append the data in the bar plot. – hajredinpasha Apr 19 '20 at 08:33
  • @hajredinpasha Ok, have you updated the code or are you willing to share the section that appends the broken boxes into the bar plot so that can see if it is completely different from the approach I had? – wwnde Apr 19 '20 at 09:01