1

I have a dataset that is a list of lists.

Each list is a category to be plotted as a box plot.

Each list has a list of up to 9 components to be plotted into subplots.

The functions I am using is below was based on this answer. I pulled it out of my work and added some mock data. Should be a minimal example below.

neonDict = {
    0:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8
    
    }
import matplotlib as mpl
import matplotlib.pyplot as plt


def coloredBoxPlot(axis, data,edgeColor,fillColor):
        bp = axis.boxplot(data,vert=False,patch_artist=True)
        for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
            plt.setp(bp[element], color=edgeColor)
            
        for patch in bp['boxes']:
            patch.set(facecolor=fillColor)
    
        return bp 

def plotCalStats(data, prefix='Channel', savedir=None,colors=['#00597c','#a8005c','#00aeea','#007d50','#400080','#e07800'] ):       
    
    csize = mpl.rcParams['figure.figsize']
    cdpi = mpl.rcParams['figure.dpi']
       
    mpl.rcParams['figure.figsize'] = (12,8)
    mpl.rcParams['figure.dpi'] = 1080
    
    pkdata  = []
    labels  = []
    lstyles = []
    
    fg, ax = plt.subplots(3,3)
    for pk in range(len(neonDict)):
        px = pk // 3
        py = pk  % 3
        ax[px,py].set_xlabel('Max Pixel')
        ax[px,py].set_ylabel('')
        ax[px,py].set_title(str(neonDict[pk]) + ' nm')    
        pkdata.append([])
    
    for cat in range(len(data)):
        bp = ''       
        
            
        for acal in data[cat]:
            for apeak in acal.peaks:
                pkdata[apeak].append(acal.peaks[apeak][0])
        
        for pk in range(9):
            px = pk // 3
            py = pk  % 3        
            bp = coloredBoxPlot(ax[px,py], pkdata[pk], colors[cat], '#ffffff')
        
        if len(data[cat]) > 0: 
            #print(colors[cat])
            #print(bp['boxes'][0].get_edgecolor())
            labels.append(prefix+' '+str(cat))
            lstyles.append(bp['boxes'][0])
    
    fg.legend(lstyles,labels) 
    fg.suptitle('Calibration Summary by '+prefix)
    fg.tight_layout()
    if savedir is not None:
        plt.savefig(savedir + 'Boxplots.png')
        
    plt.show()
    
    mpl.rcParams['figure.figsize'] = csize
    mpl.rcParams['figure.dpi']     = cdpi    
    return


class acal:
    def __init__(self):
        self.peaks = {}
        for x in range(9):
            self.peaks[x] = (np.random.randint(20*x,20*(x+1)),)

mockData = [[acal() for y in range(100)] for x in range(6)]

#Some unused channels
mockData[2] = []
mockData[3] = []
mockData[4] = []

plotCalStats(mockData)

So the issue is that the plot colors do not match the legend. Even if I restrict the data to only add a label if data exists (ensuring thus there is no issue with calling boxplots with an empty data set and not getting an appropriate PathPatch.

The printouts verify the colors are correctly stored in the PathPatch. (I can add my digits -> hex converter) if that is questioned.

Attached is the output. One can see I get a purple box but no purple in the legend. Purple is the 4th category which is empty.

Any ideas why the labels don't match the actual style? Thanks much!

EDITS: To address question on 'confusing'. I have six categories of data, each category is coming from a single event. Each event has 9 components. I want to compare all events, for each individual component, for each category on a single plot as shown below.

Each subplot is a individual component comprised from the series of data for each categorical (Channel).

So the link I have provided, (like I said, is adapted from) shows how to create a single box plot on one axis for 2 data sets. I've basically done the same thing for 6 data sets on 9 axis, where 3 data sets are empty (but don't have to be, I did it to illustrate the issue. If I have all 6 data sets there, how can you tell the colors are messed up?????)

Regarding the alpha:

The alphas are always 'ff' when giving only RGB data to matplotlib. If I call get_edgecolors, it will return a tuple (RGBA) where A = 1.0. See commented out print statement.

EDIT2:

If I restrict it down to a single category, it makes the box plot view less confusing.
Single Example Single Example (see how box plot color is orange, figure says it's blue) Problem All colors off Working Example Feel like this used to work....

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Chemistpp
  • 2,006
  • 2
  • 28
  • 48

1 Answers1

0

Uncertain how the error presented as it did, but the issue has to do with reformatting the data before creating the box plot.

By removing pkdata.append([]) during the creation of the subplots before looping the categories and adding: pkdata = [[],[],[],[],[],[],[],[],[]] during each iteration of the category loop fixed the issue. The former was sending in all previous channel data...

Output is now better. Full sol attached.

Likely, since the plot uses data from pkdata, the empty channel (data[cat]) plotted previous data (from data[cat-1]) as that was still in pkdata (actually, all previous data[cat] was still in pkdata) which was then plotted. I only check data[cat] for data on each loop to add to the legend. The legend was set up for channels 0,1,5, for example.. but we saw data for channel: 0 as 0, 0+1 as 1, 0+1 as 2, 0+1 as 3, 0+1 as 4, 0+1+5 as 5... thus channel 4 (purple) had data to plot but wasn't added to the legend. Giving the impression of 'misaligned' legends but rather unlegend data...

The single channel data is actually all 6 channels overlapping, the final channel 5 color being orange, overlapping all previous, namely the original channel 0 data to whom the data belongs and was properly added to the legend.

neonDict = {
    0:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8
    
    }
import matplotlib as mpl
import matplotlib.pyplot as plt

def getHex(r,g,b,a=1.0):

    colors = [int(r * 255 ),int(g * 255 ),int(b * 255 ),int(a * 255) ]
    s = '#'
    
    for x in range(4):
        cs = hex(colors[x])
        if len(cs) == 3:
            cs = cs + '0'
            
        s += cs.replace('0x','')
          
    return s

def getRGB(colstr):   
    try:
        a = ''
        r = int(colstr[1:3],16) / 255
        g = int(colstr[3:5],16) / 255
        b = int(colstr[5:7],16) / 255
        
        if len (colstr) == 7:
            a = 1.0
        else:
            a = int(colstr[7:],16) / 255
        
        return (r,g,b,a)
    except Exception as e:
        print(e)
        raise e
               
    return

def compareHexColors(col1,col2):
    try:
        ## ASSUME #RBG or #RBGA
        ## If less than 7, append the ff for the colors
        if len(col1) < 9:
            col1 += 'ff'
        if len(col2) < 9:
            col2 += 'ff'
        
        return col1.lower() == col2.lower()
    except Exception as e:
        raise e
    return False

def coloredBoxPlot(axis, data,edgeColor,fillColor):
        bp = axis.boxplot(data,vert=False,patch_artist=True)
        for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
            plt.setp(bp[element], color=edgeColor)
            
        for patch in bp['boxes']:
            patch.set(facecolor=fillColor)
    
        return bp 

def plotCalStats(data, prefix='Channel', savedir=None,colors=['#00597c','#a8005c','#00aeea','#007d50','#400080','#e07800'] ):       
    
    csize = mpl.rcParams['figure.figsize']
    cdpi = mpl.rcParams['figure.dpi']
       
    mpl.rcParams['figure.figsize'] = (12,8)
    mpl.rcParams['figure.dpi'] = 1080
    
    pkdata  = []
    labels  = []
    lstyles = []
    
    fg, ax = plt.subplots(3,3)
    for pk in range(len(neonDict)):
        px = pk // 3
        py = pk  % 3
        ax[px,py].set_xlabel('Max Pixel')
        ax[px,py].set_ylabel('')
        ax[px,py].set_title(str(neonDict[pk]) + ' nm')    
        
    
    for cat in range(len(data)):
        bp = ''       
        pkdata = [[],[],[],[],[],[],[],[],[]]
        
        for acal in data[cat]:
            for apeak in acal.peaks:
                pkdata[apeak].append(acal.peaks[apeak][0])
                
    
        for pk in range(9):
            px = pk // 3
            py = pk  % 3   
            bp = coloredBoxPlot(ax[px,py], pkdata[pk], colors[cat], '#ffffff')
        
        if len(data[cat]) > 0: 
            print(compareHexColors(colors[cat],getHex(*bp['boxes'][0].get_edgecolor())))
            labels.append(prefix+' '+str(cat))
            lstyles.append(bp['boxes'][0])
    
    fg.legend(lstyles,labels) 
    fg.suptitle('Calibration Summary by '+prefix)
    fg.tight_layout()
    if savedir is not None:
        plt.savefig(savedir + 'Boxplots.png')
        
    plt.show()
    
    mpl.rcParams['figure.figsize'] = csize
    mpl.rcParams['figure.dpi']     = cdpi    
    return


class acal:
    def __init__(self,center):
        self.peaks = {}      
        for x in range(9):
            self.peaks[x] = [10*x + (center) + (np.random.randint(10)-1)/2.0,0,0]

mockData = [[acal(x) for y in range(1000)] for x in range(6)]


#Some unused channels

mockData[2] = []
mockData[3] = []
mockData[4] = []

plotCalStats(mockData)
Chemistpp
  • 2,006
  • 2
  • 28
  • 48