2

The normal matplotlib boxplot command in Python returns a dictionary with keys for the boxes, median, whiskers, fliers, and caps. This makes styling really easy.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Create a dataframe and subset it for a boxplot
df1 = pd.DataFrame(rand(10), columns=['Col1'] )
df1['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])
boxes= [df1[df1['X'] == 'A'].Col1, df1[df1['X'] == 'B'].Col1]

# Call the standard matplotlib boxplot function,
# which returns a dictionary including the parts of the graph
mbp = plt.boxplot(boxes)
print(type(mbp))

# This dictionary output makes styling the boxplot easy
plt.setp(mbp['boxes'], color='blue')
plt.setp(mbp['medians'], color='red')
plt.setp(mbp['whiskers'], color='blue')
plt.setp(mbp['fliers'], color='blue')

The Pandas library has an "optimized" boxplot function for its grouped (hierarchically indexed ) dataframes. Instead of returning several dictionaries for each group, however, it returns an matplotlib.axes.AxesSubplot object. This makes styling very difficult.

# Pandas has a built-in boxplot function that returns
# a matplotlib.axes.AxesSubplot object
pbp = df1.boxplot(by='X')
print(type(pbp))

# Similar attempts at styling obviously return TypeErrors
plt.setp(pbp['boxes'], color='blue')
plt.setp(pbp['medians'], color='red')
plt.setp(pbp['whiskers'], color='blue')
plt.setp(pbp['fliers'], color='blue')

Is this AxisSubplot object produced by the pandas df.boxplot(by='X') function accessible?

Walton Jones
  • 33
  • 1
  • 4

2 Answers2

8

You could also specify the return_type as dict. This will return the boxplot properties directly in a dictionary, which is indexed by each column that was plotted in the boxplot.

To use the example above (in IPython):

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot( by='X', return_type='dict' )

>>> bp.keys()
['Col1', 'Col2']

>>> bp['Col1'].keys()
['boxes', 'fliers', 'medians', 'means', 'whiskers', 'caps']

Now, changing linewidths is a matter of a list comprehension :

>>> [ [item.set_linewidth( 2 ) for item in bp[key]['medians']] for key in bp.keys() ]
[[None, None], [None, None]]
vishakad
  • 567
  • 1
  • 6
  • 7
2

I am afraid you have to hard code. Take the pandas example: http://pandas.pydata.org/pandas-docs/stable/visualization.html#box-plotting

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot(by='X')
cl=bp[0].get_children()
cl=[item for item in cl if isinstance(item, matplotlib.lines.Line2D)]

Now lets identify which one is the boxes, median's, etc:

for i, item in enumerate(cl):
    if item.get_xdata().mean()>0:
        bp[0].text(item.get_xdata().mean(), item.get_ydata().mean(), str(i), va='center', ha='center')

And the plot looks like this:

enter image description here

Each bar consists of 8 items. e.g, The 5th item is the median. The 7th and 8th items are probably the fliers, which we don't have any here.

Knowing these, to modify some part of the bar is easy. If we want to set the median to have linewidth of 2:

for i in range(_your_number_of_classes_2_in_this_case):
    cl[5+i*8].set_linewidth(2.)
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • 1
    Awesome! Very helpful @ct-zhu. I took your solution and created a function that takes a pandas dataframe and the column you want to groupby and returns a dictionary for formatting. I would have put it below his answer, but it didn't fit. Here is a [link to a gist on github](https://gist.github.com/waltonjones/7065718). – Walton Jones Oct 20 '13 at 06:43