0

I am working through this: https://medium.com/diogo-menezes-borges/introduction-to-statistics-for-data-science-6c246ed2468d

About 3/4 of the way through there is a histogram, but the author does not supply the code used to generate it.

So I decided to give it a go...

I have everything working, but I would like to add minor ticks to my plot.

X-axis only, spaced 200 units apart (matching the bin width used in my code).

In particular, I would like to add minor ticks in the style from the last example from here: https://matplotlib.org/3.1.0/gallery/ticks_and_spines/major_minor_demo.html

I have tried several times but I just can't get that exact 'style' to work on my plot.

Here is my working code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

print('NumPy: {}'.format(np.__version__))
print('Pandas: {}'.format(pd.__version__))
print('\033[1;31m' + '--------------' + '\033[0m')  # Bold red

display_settings = {
    'max_columns': 15,
    'max_colwidth': 60,
    'expand_frame_repr': False,  # Wrap to multiple pages
    'max_rows': 50,
    'precision': 6,
    'show_dimensions': False
}
# pd.options.display.float_format = '{:,.2f}'.format

for op, value in display_settings.items():
    pd.set_option("display.{}".format(op), value)

file = "e:\\python\\pandas\\medium\\sets.csv"
lego = pd.read_csv(file, encoding="utf-8")
print(lego.shape, '\n')
print(lego.info(), '\n')
print(lego.head(), '\n')
print(lego.isnull().sum(), '\n')

dfs = [lego]
names = ['lego']


def NaN_percent(_df, column_name):
    # empty_values = row_count - _df[column_name].count()
    empty_values = _df[column_name].isnull().sum()
    return (100.0 * empty_values)/row_count


c = 0
print('Columns with missing values expressed as a percentage.')
for df in dfs:
    print('\033[1;31m' + ' ' + names[c] + '\033[0m')
    row_count = df.shape[0]
    for i in list(df):
        x = NaN_percent(df, i)
        if x > 0:
            print('  ' + i + ': ' + str(x.round(4)) + '%')
    c += 1
    print()

# What is the average number of parts in the sets of legos?
print(lego['num_parts'].mean(), '\n')

# What is the median number of parts in the sets of legos?
print(lego['num_parts'].median(), '\n')

print(lego['num_parts'].max(), '\n')

# Create Bins for Data Ranges
bins = []

for i in range(lego['num_parts'].min(), 6000, 200):
    bins.append(i + 1)

# Use 'right' to determine which bin overlapping values fall into.
cuts = pd.cut(lego['num_parts'], bins=bins, right=False)

# Count values in each bin.
print(cuts.value_counts(), '\n')

plt.hist(lego['num_parts'], color='red', edgecolor='black', bins=bins)
plt.title('Histogram of Number of parts')
plt.xlabel('Bin')
plt.ylabel('Number of values per bin')
plt.axvline(x=162.2624, color='blue')
plt.axvline(x=45.0, color='green', linestyle='--')
# https://matplotlib.org/gallery/text_labels_and_annotations/custom_legends.html

legend_elements = [Line2D([0], [0], color='blue', linewidth=2, linestyle='-'),
                   Line2D([0], [1], color='green', linewidth=2, linestyle='--')
                   ]
labels = ['mean: 162.2624', 'median: 45.0']
plt.legend(legend_elements, labels)
plt.show()
MarkS
  • 1,455
  • 2
  • 21
  • 36
  • Pardon my ignorance of matplotlib, but for both of those I get: unresolved reference 'ax' when I try to insert either of them into my code. – MarkS Jan 11 '20 at 00:13
  • Ah... Those were three separate lines, plus I had to add: ```from matplotlib.ticker import (AutoMinorLocator)``` It's working. Resubmit as an answer and I will accept it. – MarkS Jan 11 '20 at 00:37

1 Answers1

0

You can just add:

ax = plt.gca()
ax.xaxis.set_minor_locator(AutoMinorLocator()) 
ax.tick_params(which='minor', length=4, color='r')

See this post to get a better idea about the difference between plt, ax and fig. In broad terms, plt refers to the pyplot library of matplotlib. fig is one "plot" that can consist of one or more subplots. ax refers to one subplot and the x and y-axis defined for them, including the measuring units, tick marks, tick labels etc.. Many function in matplotlib are often called as plt.hist, but in the underlying code they are drawing on the "current axes". These axes can be obtained via plt.gca() or "get current axes". It is not always clear which functions can be called via plt. and which only exist via ax.. Also, sometimes the get slightly different names. You'll need to look in the documentation or search StackOverflow which form is needed in each specific case.

JohanC
  • 71,591
  • 8
  • 33
  • 66