0

I am using matplotlib to generate a boxplot. In order to create the plot, the boxplot class is internally calculating the means, standard deviations and medians. How can I extract these numerical values?

The boxplot returns a dictionary with the objects within it. Included in this is a list of Line2D objects under the key 'medians', one for each series in the data array.

box = ax.boxplot(data, showmeans=True)
box
>>>{'whiskers': [<matplotlib.lines.Line2D at 0x2580149d0d0>,
  <matplotlib.lines.Line2D at 0x2580149d460>,
  <matplotlib.lines.Line2D at 0x258014a7a00>,
  <matplotlib.lines.Line2D at 0x258014a7d90>, ...etc.... ],
 'caps': [<matplotlib.lines.Line2D at 0x2580149d7f0>,
  <matplotlib.lines.Line2D at 0x2580149db80>,
  <matplotlib.lines.Line2D at 0x2580170a160>,
  <matplotlib.lines.Line2D at 0x2580170a4f0>, ...etc.... ],
 'boxes': [<matplotlib.lines.Line2D at 0x2580410fd00>,
  <matplotlib.lines.Line2D at 0x258014a7670>,
  <matplotlib.lines.Line2D at 0x2580170afa0>,
  <matplotlib.lines.Line2D at 0x25801708910>, ...etc.... ],
 'medians': [<matplotlib.lines.Line2D at 0x2580149df10>,
  <matplotlib.lines.Line2D at 0x2580170a880>,
  <matplotlib.lines.Line2D at 0x258017081f0>,
  <matplotlib.lines.Line2D at 0x25802263b20>, ...etc.... ],
 'fliers': [],
 'means': [<matplotlib.lines.Line2D at 0x258014a72e0>,
  <matplotlib.lines.Line2D at 0x2580170ac10>,
  <matplotlib.lines.Line2D at 0x25801708580>,
  <matplotlib.lines.Line2D at 0x25802263eb0>, ...etc.... ],}

Is there some way I can get the median values (and also means, standard deviations) back from the plot object itself? This could be useful in some cases, to compare it to the values I calculated from the data myself.

feedMe
  • 3,431
  • 2
  • 36
  • 61
  • Perhaps you have a good reason not to, but why not just compute the means/medians/etc. yourself assuming you have access to the same data given to the plotting function? – Andre Oct 04 '21 at 16:09
  • I can not reproduce your behaviour. With standard normal distributed data I get `array([0.06614102, 0.06614102])`. Can you include a sample of your data? – Michael Szczesny Oct 04 '21 at 16:09
  • 3
    I suspect there are some NaNs in your data. If you create the boxplot [here](https://matplotlib.org/stable/gallery/pyplots/boxplot_demo_pyplot.html#sphx-glr-gallery-pyplots-boxplot-demo-pyplot-py), then `box['medians'][0].get_ydata()` returns `array([50., 50.])`. So I think you either need to check your dataset carefully, or create a [MCVE] which reproduces this issue – tmdavison Oct 04 '21 at 16:15
  • presumably the first box plotted does not display the median line on it? Do any of the boxes have the median on them? – tmdavison Oct 04 '21 at 16:17
  • @MichaelSzczesny Oh dear, I was testing on an edge case where the first series was empty! Apologies for the wild goose chase. Instead of just deleting it, I will edit the question to make it useful as I don't think this is a duplicate topic on SO. – feedMe Oct 04 '21 at 16:17
  • `matplotlib.cbook.boxplot_stats` or as in the accepted answer, as shown in the duplicate – Trenton McKinney Oct 04 '21 at 18:45

0 Answers0