6

I'm making a time series boxplot using seaborn package but I can't put a label on my outliers.

My data is a dataFrame of 3 columns : [Month , Id , Value] that we can fake like that :

### Sample Data ###
Month = numpy.repeat(numpy.arange(1,11),10)
Id = numpy.arange(1,101)
Value = numpy.random.randn(100)

### As a pandas DataFrame ###
Ts = pandas.DataFrame({'Value' : Value,'Month':Month, 'Id': Id})

### Time series boxplot ###
ax = seaborn.boxplot(x="Month",y="Value",data=Ts)

I have one boxplot for each month and I'm trying to put the Id as a label of the three outliers on the plot here:
1

Zephyr
  • 11,891
  • 53
  • 45
  • 80
KB23
  • 63
  • 1
  • 6
  • Welcome to Stack Overflow. Please take some time to read how to write a [Minimum, Complete and Verifiable Example](http://stackoverflow.com/help/mcve). As it stands, nobody knows any of the code you're using to create these plots, so it's not possible for us to help you properly. – roganjosh Nov 07 '16 at 16:49
  • I believe this post http://stackoverflow.com/questions/35131798/tweaking-seaborn-boxplot answers your query on displaying outliers. – jnic Nov 07 '16 at 16:56
  • Thanks for your answer. I added some details about my issue. @jnic I'm not trying to display outliers but to display outliers labels using the Id column – KB23 Nov 08 '16 at 09:51
  • It could make sense not to use seaborn here, because it does not give access to the underlying features easily. Instead using matplotlib boxplot as [here](https://stackoverflow.com/questions/45354215/matplotlib-boxplot-showing-number-of-occurrences-of-integer-outliers) could be an option. – ImportanceOfBeingErnest Jan 23 '18 at 17:26

1 Answers1

3

First of all, you need to detect which Id in your dataframe are outliers, you can use this:

outliers_df = pd.DataFrame(columns = ['Value', 'Month', 'Id'])
for month in Ts['Month'].unique():
        outliers = [y for stat in boxplot_stats(Ts[Ts['Month'] == month]['Value']) for y in stat['fliers']]
        if outliers != []:
                for outlier in outliers:
                        outliers_df = outliers_df.append(Ts[(Ts['Month'] == month) & (Ts['Value'] == outlier)])

which creates a dataframe, similar to the original one, containing outliers only.
Then you can annotare Id on your plot with this:

for row in outliers_df.iterrows():
        ax.annotate(row[1]['Id'], xy=(row[1]['Month'] - 1, row[1]['Value']), xytext=(2,2), textcoords='offset points', fontsize=14)

The complete code:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
sns.set_style('darkgrid')

Month = np.repeat(np.arange(1,11),10)
Id = np.arange(1,101)
Value = np.random.randn(100)

Ts = pd.DataFrame({'Value' : Value,'Month':Month, 'Id': Id})

fig, ax = plt.subplots()
sns.boxplot(ax=ax, x="Month",y="Value",data=Ts)

outliers_df = pd.DataFrame(columns = ['Value', 'Month', 'Id'])
for month in Ts['Month'].unique():
        outliers = [y for stat in boxplot_stats(Ts[Ts['Month'] == month]['Value']) for y in stat['fliers']]
        if outliers != []:
                for outlier in outliers:
                        outliers_df = outliers_df.append(Ts[(Ts['Month'] == month) & (Ts['Value'] == outlier)])

for row in outliers_df.iterrows():
        ax.annotate(row[1]['Id'], xy=(row[1]['Month'] - 1, row[1]['Value']), xytext=(2,2), textcoords='offset points', fontsize=14)

plt.show()

output:

enter image description here

Zephyr
  • 11,891
  • 53
  • 45
  • 80