3

I have started using python for lots of data problems at work and the datasets are always slightly different. I'm trying to explore more efficient ways of plotting data using the inbuilt pandas function rather than individually writing out the code for each column and editing the formatting to get a nice result.

Background: I'm using Jupyter notebook and looking at histograms where the values are all unique integers.

Problem: I want the xtick labels to align with the centers of the histogram bars when plotting multiple columns of data with the one function e.g. df.hist() to get histograms of all columns at once.

  • Does anyone know if this is possible?
  • Or is it recommended to do each graph on its own vs. using the inbuilt function applied to all columns?

enter image description here

I can modify them individually following this post: Matplotlib xticks not lining up with histogram which gives me what I would like but only for one graph and with some manual processing of the values.

Desired outcome example for one graph:

enter image description here

Basic example of data I have:

# Import libraries 
import pandas as pd 
import numpy as np

# create list of datapoints
data = [[170,30,210],
        [170,50,200],
        [180,50,210],
        [165,35,180],
        [170,30,190],
        [170,70,190],
        [170,50,190]]

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['height', 'width','weight']) 

# print dataframe. 
df

Code that displays the graphs in the problem statement

df.hist(figsize=(5,5))
plt.show()

Code that displays the graph for weight how I would like it to be for all

df.hist(column='weight',bins=[175,185,195,205,215])
plt.xticks([180,190,200,210])
plt.yticks([0,1,2,3,4,5])
plt.xlim([170, 220])
plt.show()

Any tips or help would be much appreciated!

Thanks

Community
  • 1
  • 1
MichaelH
  • 43
  • 2

1 Answers1

1

I hope this helps.You take the column and count the frequency of each label (value counts) then you specify sort_index in order to get the order by the label not by the frecuency, then you plot the bar plot.

data = [[170,30,210],
        [170,50,200],
        [180,50,210],
        [165,35,180],
        [170,30,190],
        [170,70,190],
        [170,50,190]]

 # Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
df.weight.value_counts().sort_index().plot(kind = 'bar')
plt.show()
Andres Ordorica
  • 302
  • 1
  • 5