0

I'm new to Matplotlib / Python, and am trying to make a grouped boxplot very similar to Joe Kington's excellent example shown here:

how to make a grouped boxplot graph in matplotlib

I'd like to modify Joe's example for my own requirements.

For my demo data below, I have 5 individuals who each have 4 attempts ( = "attempts": '1st','2nd','3rd','4th') at each of 3 different tasks (= "tasks": 'A','B','C').

I'd like to be able to:

1) input my data as a series of 2D numpy arrays, one array per task as shown, which are each composed of the scores of the 5 individuals nested within the 4 sequential attempts.

2) label both the tasks and attempts on the shared x-axis of the plot using strings, saved as sequential items in the lists "tasklist" and "attemptlist" respectively.

3) generalise the solution to make the appropriate plots for any number of individuals, and any number of tasks, each requiring any number of repeated attempts.

Edit: 2 Apr 2015:

The only problem outstanding is the seemingly counter-intuitive way that Python lists assemble themselves into a non-sequential order when using the .keys() method; hence my tasklist keeps coming out as "A,C,B" rather than "A,B,C". The workaround is to import and create an Ordered Dictionary. This is all new to me, but this would seem to require the item names in my tasklist to be declared twice as Joe did in his example - once to associate the tasks with the corresponding data matrices, and once to associate the item names in the Ordered Dictionary with the corresponding sequential numeric keys...

Was wondering: is there a method (akin to the .keys() method for regular dictionaries) which would iterate over my data matrices to create an Ordered Dictionary in the order shown ("A,B,C"), without requiring me to enter details of my tasklist twice?

Many thanks

Dave

import matplotlib.pyplot as plt
import numpy as np

data = {}
data ['A'] = np.array([[1,2,3,4,9],[2,3,4,4,4],[3,4,4,5,5],[5,6,6,7,7,7]])
data ['B'] = np.array([[2,3,4,4,5],[3,4,5,6,10],[4,5,6,6,7],[5,6,7,7,8]])
data ['C'] = np.array([[4,5,6,6,10],[6,7,8,8,8],[7,8,9,9,10],[2,10,11,11,12]])

tasklist = data.keys() #  list of labels for tasks 'A' to 'C' (each containing 4 attempts labelled '1st' to '4th')
attemptlist = ['1st','2nd','3rd','4th'] # list of labels for attempts 1 to 4 within each task

fig, axes = plt.subplots(ncols= len(tasklist), sharey=True)
fig.subplots_adjust(wspace=0)

for ax,task in zip(axes,tasklist):
    ax.boxplot([data[task][attemptlist.index(attempt)] for attempt in attemptlist],showfliers=False)
    ax.set(xticklabels=attemptlist, xlabel=task)
plt.show()
Community
  • 1
  • 1
Dave
  • 515
  • 1
  • 8
  • 17
  • 1
    This is still kind of a code dump -- your organization of the problem into steps is fine, so work on each step at a time and ask specific questions about the ones that fail. – cphlewis Apr 01 '15 at 17:26
  • Apologies for the previous shoddy code & thanks for your encouragement: have now edited my code and updated my query. – Dave Apr 02 '15 at 18:12
  • Trim off all that text that no longer applies! also, I don't see why you need a dict at all if you need it ordered -- you can make a list of tuples and iterate over that: `for label, d in [('A'), [...]),('B',[...])]`. – cphlewis Apr 02 '15 at 18:26

1 Answers1

1

@cphlewis: Many thanks: on your advice have re-written code with data formatted as list of tuples (task, data), and now have control over order in which tasks are plotted.

MWE posted below in case this is helpful for anyone else.

import matplotlib.pyplot as plt

data = [[('A'),[[1,2,3,4,9],[2,3,4,4,4],[3,4,4,5,5],[5,6,6,7,7,7]]],
    [('B'),[[2,3,4,4,5],[3,4,5,6,10],[4,5,6,6,7],[5,6,7,7,8]]],
     [('C'),[[4,5,6,6,10],[6,7,8,8,8],[7,8,9,9,10],[2,10,11,11,12]]]
     ]
attemptlist = ['1st','2nd','3rd','4th'] 
fig, axes = plt.subplots(ncols= len(data), sharey=True)
fig.subplots_adjust(wspace=0)

for ax,d in zip(axes,data):     
    ax.boxplot([d[1][attemptlist.index(attempt)] for attempt in attemptlist],showfliers=False)
    ax.set(xticklabels=attemptlist, xlabel=d[0])
plt.show()
Dave
  • 515
  • 1
  • 8
  • 17