0

The following is my data, of which I would like to plot the monthly frequency. There are missing values.

YEAR    MONTH
1960    5
1961    7
1961    8
1961    11
1962    5
1963    6
1964    
1965    7
1966    7
1966    7
1966    10
1967    4
1967    8
1968    
1969    
1970    8
1971    6
1971    9
1971    10
1972    7
1973    6
1973    9
1974    10
1974    10
1975    10
1976    
1977    
1978    9
1979    11
1980    7
1980    7
1980    8
1981    
1982    10
1982    12
1983    
1984    7
1985    9
1986    
1987    
1988    9
1988    10
1989    7
1989    10
1990    
1991    7
1992    
1993    6
1993    7
1993    9
1993    9
1994    
1995    7
1996    8
1996    9
1997    5
1998    8
1998    9
1998    10
1999    8
1999    9
2000    9
2001    
2002    1
2003    5
2003    7
2003    8
2003    9
2003    10
2004    
2005    11
2006    7
2006    10
2007    9
2007    11
2007    11
2008    5
2009    5
2009    7
2009    9
2009    9
2010    10
2011    5
2011    9
2011    9
2012    8
2013    7
2014    9
2015    7
2016    
2017    8
2018    10
2019    11
2020    

I used the following code in a Jupyter Notebook. There are other columns but I selected only the month.

#Plot Frequency
ISA = pd.read_csv (r'G\:data.csv', encoding="ISO-8859-1")
ISA = pd.DataFrame(ISA,columns=['YEAR','MONTH','TYPE'])
ISA=  ISA[ISA['YEAR'].between(1960,2020, inclusive="both")]
ISA['YEAR'] = pd.to_datetime(ISA['MONTH'])
ISA = ISA.set_index('YEAR')
ISA=ISA.drop(['MSW','TC NAME', 'KNOTS','PAR BEG', 'PAR END'],axis=1)
ISA=ISA.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
ax=ISA.groupby([ISA.index.month, 'MONTH']).count().plot(kind='bar',color='lightgray',width=1, edgecolor='darkgray')
plt.xlabel('Month', color='black', fontsize=14, weight='bold')
plt.ylabel('Monthly frequency' , color='black', fontsize=14, weight='bold',)
plt.xticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct','Nov','Dec'],rotation=0, fontsize=12)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))
plt.yticks(fontsize=12)
plt.ylim(0,20)
plt.suptitle("Monthly Frequency",fontweight='bold',y=0.95,x=0.53)
plt.title("ISA", pad=0)
L=plt.legend()
L.get_texts()[0].set_text('Frequency')
plt.bar_label(ax.containers[0], label_type='center', fontsize=11)
plt.plot()
plt.tight_layout()
plt.show()

Using this code, the resulting plot includes February and other months. It should be zero. Can you help me adjust the bar chart? OR if there is something wrong with my code.

Here is output image

user2543
  • 121
  • 9

1 Answers1

0

This comes close with your supplied example data:

# Read the initial data to a dataframe
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.ticker as mtick
ISA = pd.read_csv (r'data.txt', delim_whitespace=True)
ISA = pd.DataFrame(ISA,columns=['YEAR','MONTH'])
ISA['MONTH'] = ISA['MONTH'].astype(dtype='Int64')
ISA=  ISA[ISA['YEAR'].between(1960,2020, inclusive="both")] 
# Use `value_counts()` with that dataframe to collect counts fixing for the month numbers that are missing
# because no values ever reported for those months in imported data
months_count_collected = {}
for x in range (1,13):
    if x in ISA['MONTH'].value_counts():
        months_count_collected[x] = ISA['MONTH'].value_counts()[x]
        #print(ISA['MONTH'].value_counts()[x])
    else:
        months_count_collected[x] = 0
        #print(0)
# Make a dataframe with the frequency from `months_count_collected` where those with zero counts added back in
df = pd.DataFrame.from_dict(months_count_collected, orient='index', columns = ["Frequency"])
# Make plot from frequency dataframe
ax = df.sort_index().plot(kind='bar',color='lightgray',width=1, edgecolor='darkgray'); # note that `sort_index().` isn't
# needed here but would come in handy perhaps if values for unrepresented months added later/differently and can be useful when developing
# and left in so it's handy; `sort_index()` usee based on https://stackoverflow.com/a/57876952/8508004 .
# Set tick labels to the month names based on https://stackoverflow.com/a/30280076/8508004
ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct','Nov','Dec'],rotation=0, fontsize=12);
ax.set_xlabel('Month', color='black', fontsize=14, weight='bold')
ax.set_ylabel('Annual frequency' , color='black', fontsize=14, weight='bold',)
#ax.set_title("Passage Frequency", pad=0);
#plt.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.0f')) # based on OP code and https://stackoverflow.com/a/36319915/8508004 to import and use `mtick` with Pandas
plt.yticks(fontsize=12)
ax.set_ylim(0,20)
plt.suptitle("Monthly Frequency",fontweight='bold',y=0.95,x=0.53)
plt.title("ISA", pad=0)
L=plt.legend()
L.get_texts()[0].set_text('Frequency')
plt.bar_label(ax.containers[0], label_type='center', fontsize=11)
plt.plot()
plt.tight_layout()
plt.show();

There's probably a more clever way to fill in the months unrepresented in the input.
And titles and labels get generated but may not be correct text right now.

What it makes: enter image description here

Wayne
  • 6,607
  • 8
  • 36
  • 93
  • For those curious, it looks like related data is plotted [here](https://stackoverflow.com/q/75092508/8508004) from year to year. – Wayne Jan 12 '23 at 21:10