1

I am asking a question stemming from this original post Heatmap with circles indicating size of population

I am trying to replicate this using my dataframe, however, my circles are non aligning to the plot. Secondary, I want to also create a legend which indicates the value relative to the size of circle.

   x= {'ID': {0: 'GO:0002474',
      1: 'GO:0052548',
      2: 'GO:0002483',
      3: 'GO:0043062',
      4: 'GO:0060333'},
     'TERM': {0: 'antigen processing and presentation of peptide antigen via MHC class I',
      1: 'regulation of endopeptidase activity',
      2: 'antigen processing and presentation of endogenous peptide antigen',
      3: 'extracellular structure organization',
      4: 'interferon-gamma-mediated signaling pathway'},
     'Count': {0: 11, 1: 17, 2: 5, 3: 15, 4: 6},
     'Ratio': {0: 18.64, 1: 14.53, 2: 8.47, 3: 12.82, 4: 10.17},
     'pvalue': {0: -15.83, 1: -11.39, 2: -9.67, 3: -9.05, 4: -7.41},
     'qvalue': {0: -11.63, 1: -7.49, 2: -6.52, 3: -5.63, 4: -4.55},
     'Label': {0: 'NODAL', 1: 'NODAL', 2: 'NODAL', 3: 'SHARED', 4: 'NODAL'}}

A2780_GOBP= pd.DataFrame(x)

Attempted Code:

ylabels = A2780_GOBP["TERM"]
xlabels = ["GFP","SHARED","NODAL"]
x, y = np.meshgrid(np.arange(len(xlabels)), np.arange(len(ylabels)))
s = A2780_GOBP["Count"].values
c = A2780_GOBP["pvalue"].values

fig, ax = plt.subplots()

R = s/s.max()/2
circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]
col = PatchCollection(circles, array=c.flatten(), cmap=cmap)
ax.add_collection(col)

ax.set(xticks=np.arange(3), yticks=np.arange(10),
       xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(3+1)-0.5, minor=True)
ax.set_yticks(np.arange(10+1)-0.5, minor=True)
ax.grid(which='minor')


fig.colorbar(col)
plt.show()

Output

Any help would be greatly appreciated!

thejahcoop
  • 160
  • 11
  • @Mr. T How do I import a data frame to here? – thejahcoop Jan 08 '21 at 15:38
  • 2
    Print `df.head(N).to_dict()`, copy paste. More information [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). However, it is probably indeed irrelevant. I thought you copied fully the code, but the culprit is most probably `np.meshgrid(np.arange(len(xlabels)), np.arange(len(ylabels)))`. If it is indeed just this typo, I suggest deleting this question. – Mr. T Jan 08 '21 at 15:42
  • 1
    it somewhat fixed it, but the circles are not positioned at the right x and y coordinates. I imagine it is "circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]" that is the issue, but I am unfamiliar on how to use strings as x and y. – thejahcoop Jan 08 '21 at 15:52
  • 1
    OK, so not trivial. I will have a look at it. – Mr. T Jan 08 '21 at 15:57
  • @Mr. T Greatly appreciated! – thejahcoop Jan 08 '21 at 15:57

2 Answers2

1

The problem is here that the copied code fills all fields, whereas your code not necessarily has an entry in each box. We have to look up, where each circle has to be plotted:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
import pandas as pd

x= {'ID': {0: 'GO:0002474',
      1: 'GO:0052548',
      2: 'GO:0002483',
      3: 'GO:0043062',
      4: 'GO:0060333'},
     'TERM': {0: 'antigen processing and presentation of peptide antigen via MHC class I',
      1: 'regulation of endopeptidase activity',
      2: 'antigen processing and presentation of endogenous peptide antigen',
      3: 'extracellular structure organization',
      4: 'interferon-gamma-mediated signaling pathway'},
     'Count': {0: 11, 1: 17, 2: 5, 3: 15, 4: 6},
     'Ratio': {0: 18.64, 1: 14.53, 2: 8.47, 3: 12.82, 4: 10.17},
     'pvalue': {0: -15.83, 1: -11.39, 2: -9.67, 3: -9.05, 4: -7.41},
     'qvalue': {0: -11.63, 1: -7.49, 2: -6.52, 3: -5.63, 4: -4.55},
     'Label': {0: 'NODAL', 1: 'GFP', 2: 'NODAL', 3: 'SHARED', 4: 'NODAL'}}

A2780_GOBP= pd.DataFrame(x)
cmap = "plasma"
 
#retrieve unique labels
ylabels = A2780_GOBP["TERM"].unique().tolist()
xlabels = A2780_GOBP["Label"].unique().tolist()
xn = len(xlabels)
yn = len(ylabels)
#retrieve size and color information    
s = A2780_GOBP["Count"].values
c = A2780_GOBP["pvalue"].values


#preparation of the figure with its grid
fig, ax = plt.subplots(figsize=(10, 5))
ax.set_xlim(-0.5, xn-0.5)
ax.set_ylim(-0.5, yn-0.5)
ax.set(xticks=np.arange(xn), yticks=np.arange(yn),
       xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(xn)-0.5, minor=True)
ax.set_yticks(np.arange(yn)-0.5, minor=True)
ax.grid(which='minor')
#ensure circles are displayed as circles
ax.set_aspect("equal", "box")

#create circles patches and colorbar
R = s/s.max()/2
circles = [plt.Circle((xlabels.index(A2780_GOBP.loc[i, "Label"]), ylabels.index(A2780_GOBP.loc[i, "TERM"])), radius=r) for i, r in enumerate(R)]
col = PatchCollection(circles, array=c, cmap=cmap)
ax.add_collection(col)
fig.colorbar(col)

plt.show()

Sample output:

enter image description here

The code does not check the integrity of your original database, i.e., that each Label-Term pair indeed only occurs once.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
  • To create a second legend with the sizes of circles, would I simply use plt.legend()? – thejahcoop Jan 09 '21 at 14:08
  • You are a scientist. Didn't you try to test whether it works? (Hint: It doesn't because we do not provide any labels.) [This](https://stackoverflow.com/a/59381424/8881141) and [this](https://stackoverflow.com/a/58485655/8881141) show you principles of how to create the legend from scratch but this probably has to be adapted to your current code. – Mr. T Jan 09 '21 at 14:59
  • I realized how naive this was to ask. Thank you for the links! – thejahcoop Jan 09 '21 at 16:36
  • 1
    Au contraire. Most questions only look naive in retrospect; the point is to learn from the answers. Good luck implementing the legend. If it does not work - ask another question. – Mr. T Jan 09 '21 at 16:47
  • I figured it out (see below). I am sure there is a better way to create a list of patches, but I work on that later. Thanks again! – thejahcoop Jan 10 '21 at 14:43
  • One can always optimize minute details but imho the graph looks publication-ready. My only question would be: Shouldn't the label entries `17, 10, 3` be generated based on your data? Oh, an another thing - have you checked that the sizes are correct? The max circle in the legend looks smaller than the max circle in the figure. – Mr. T Jan 10 '21 at 14:52
  • they are, see smax/mid/min, however, I hadn't converted them to a list yet so the same code can be applied to adjacent data frames. This could also be a result of using Line2D instead of the circle patches generated for the figure. I did come across a PatchHandler module, but haven't familiarized myself with it yet. – thejahcoop Jan 10 '21 at 15:12
  • 1
    I am afraid they are not. [Marker sizes and patch radius will most certainly be differently interpreted](https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size) (and they are: I tried it with my smaller sample, and the sizes did not match at all). You should instead insert circle patches of the correct size as legend handles. Not difficult. Sorry, if it was me to set you onto wrong track. – Mr. T Jan 10 '21 at 15:18
1

Adapted answer for @Mr. T to include legend generator

from matplotlib.legend_handler import HandlerPatch
import matplotlib.patches as mpatches

ylabels = A2780_GOBP["TERM"].unique().tolist()
xlabels = A2780_GOBP["Label"].unique().tolist()
xn = len(xlabels)
yn = len(ylabels)    
s = A2780_GOBP["Count"].values
c = A2780_GOBP["pvalue"].values

fig, ax = plt.subplots(figsize=(20,10))
ax.set_xlim(-0.5, xn-0.5)
ax.set_ylim(-0.5, yn-0.5)
ax.set(xticks=np.arange(xn), yticks=np.arange(yn), yticklabels=ylabels)
ax.set_xticklabels(xlabels, rotation='vertical')
ax.set_xticks(np.arange(xn)-0.5, minor=True)
ax.set_yticks(np.arange(yn)-0.5, minor=True)
ax.grid(which='minor')
ax.set_aspect("equal", "box")

R = s/s.max()/2
circles = [plt.Circle((xlabels.index(A2780_GOBP.loc[i, "Label"]), ylabels.index(A2780_GOBP.loc[i, "TERM"])), radius=r) for i, r in enumerate(R)]
col = PatchCollection(circles, array=c, cmap=cmap)
sc=ax.add_collection(col)
cbar=fig.colorbar(col).set_label('$-log_{10}(p-value)$', rotation=270, size=16,labelpad=15)

smax=s.max()
smin=s.min()
smid=(smax+smin)/2
texts = ["3","10","17"]


class HandlerEllipse(HandlerPatch):
    def create_artists(self, legend, orig_handle,
                       xdescent, ydescent, width, height, fontsize, trans):
        center = 0.5 * width - 0.5 * xdescent, 0.5 * height - 0.5 * ydescent
        p = mpatches.Ellipse(xy=center, width=orig_handle.width,
                                        height=orig_handle.height)
        self.update_prop(p, orig_handle, legend)
        p.set_transform(trans)
        return [p]
    
c = [mpatches.Ellipse((), width=smin, height=smin, color="grey"),
     mpatches.Ellipse((), width=smid, height=smid, color="grey"),
     mpatches.Ellipse((), width=smax, height=smax, color="grey"),
    ]

legend = ax.legend(c,texts, handler_map={mpatches.Ellipse: HandlerEllipse()},title="Number of Proteins",bbox_to_anchor=(3.50, 0.82, 1.0, .102),fontsize="large")
plt.setp(legend.get_title(),fontsize='large')
plt.show()

Output: output

thejahcoop
  • 160
  • 11
  • 1
    Woohoo, go team! – Mr. T Jan 10 '21 at 14:43
  • improvements could be to list the smin/mid/max as strings in the text list. Likewise, a cleaner way to create c. – thejahcoop Jan 11 '21 at 05:35
  • 1
    I apologize that I said it would be easy to include patches into the legend. I found out afterward that this is not trivial for circles. Glad you have seen the workaround for non-supported patches. Imho the code looks good now. Back to the pipettes. – Mr. T Jan 11 '21 at 13:08
  • No apology needed. It was fun to learn how to do it. – thejahcoop Jan 11 '21 at 13:47
  • 1
    One thing I am finding is that is doesn't transfer to similar datasets with different values. The circles in the legend have to be scaled manually. I am going to play around with it, but I may have to post another question to figure this out. – thejahcoop Jan 12 '21 at 03:57
  • 1
    @thejahcoop I am trying this and I am actually having trouble having the circle scale properly. Any suggestions? The legend is also showing quite odd, just a giant circle, can't see the other two smaller circles. What about having annotations on the circle for one of the variables, say the pvalue for example. Any suggestions? Thank you! – eurojourney Jun 07 '22 at 16:28
  • @eurojourney yes, I realized this after applying to a different data set. It has been a while since I have used this but I believe you have to modify the R variable. Maybe somebody wants to revisit this – thejahcoop Jun 16 '22 at 14:22