0

I'm trying to find a way to plot the mean and individual values (not a bar chart or boxplot), but using a nested x axis. The dataframe I have is as follows:

{'Product': {0: 'Apple',
  1: 'Apple',
  2: 'Apple',
  3: 'Apple',
  4: 'Apple',
  5: 'Apple',
  6: 'Apple',
  7: 'Apple',
  8: 'Apple',
  9: 'Apple',
  10: 'Apple',
  11: 'Apple',
  12: 'Apple',
  13: 'Apple',
  14: 'Apple',
  15: 'Orange',
  16: 'Orange',
  17: 'Orange',
  18: 'Orange',
  19: 'Orange',
  20: 'Orange',
  21: 'Orange',
  22: 'Orange',
  23: 'Orange',
  24: 'Orange',
  25: 'Orange',
  26: 'Orange',
  27: 'Orange',
  28: 'Orange',
  29: 'Orange',
  30: 'Banana',
  31: 'Banana',
  32: 'Banana',
  33: 'Banana',
  34: 'Banana',
  35: 'Banana',
  36: 'Banana',
  37: 'Banana',
  38: 'Banana',
  39: 'Banana',
  40: 'Banana',
  41: 'Banana',
  42: 'Banana',
  43: 'Banana',
  44: 'Banana'},
 'Tester': {0: 'Anne',
  1: 'Anne',
  2: 'Anne',
  3: 'Anne',
  4: 'Anne',
  5: 'Steve',
  6: 'Steve',
  7: 'Steve',
  8: 'Steve',
  9: 'Steve',
  10: 'Paula',
  11: 'Paula',
  12: 'Paula',
  13: 'Paula',
  14: 'Paula',
  15: 'Anne',
  16: 'Anne',
  17: 'Anne',
  18: 'Anne',
  19: 'Anne',
  20: 'Steve',
  21: 'Steve',
  22: 'Steve',
  23: 'Steve',
  24: 'Steve',
  25: 'Paula',
  26: 'Paula',
  27: 'Paula',
  28: 'Paula',
  29: 'Paula',
  30: 'Anne',
  31: 'Anne',
  32: 'Anne',
  33: 'Anne',
  34: 'Anne',
  35: 'Steve',
  36: 'Steve',
  37: 'Steve',
  38: 'Steve',
  39: 'Steve',
  40: 'Paula',
  41: 'Paula',
  42: 'Paula',
  43: 'Paula',
  44: 'Paula'},
 'Result': {0: 5,
  1: 7,
  2: 4,
  3: 9,
  4: 10,
  5: 3,
  6: 6,
  7: 1,
  8: 9,
  9: 11,
  10: 2,
  11: 3,
  12: 5,
  13: 3,
  14: 2,
  15: 7,
  16: 8,
  17: 7,
  18: 6,
  19: 5,
  20: 9,
  21: 8,
  22: 9,
  23: 6,
  24: 7,
  25: 3,
  26: 7,
  27: 9,
  28: 7,
  29: 1,
  30: 11,
  31: 12,
  32: 11,
  33: 10,
  34: 9,
  35: 12,
  36: 12,
  37: 14,
  38: 8,
  39: 6,
  40: 7,
  41: 4,
  42: 5,
  43: 7,
  44: 8}}

​ I'd like to be able to reproduce the following graph I can plot in Minitab:

Minitab plot

I've tried looking in various packages (seaborn as a preference, followed by matplotlib and hvplot), but from what I understand they cannot plot a multi, or hierarchical x axis, unless it's for a boxplot or bar chart.

I have tried sns.catplot with the parameter col set to 'product', but this splits the plots and I'd like it to be all in one chart.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • This is likely a duplicate of [How to add a mean line to a seaborn stripplot or swarmplot](https://stackoverflow.com/q/67481900/7758804): `p = sns.stripplot(x="Product", y="Result", data=df, size=4, hue='Tester')` and then add the boxplot in the duplicate. [See code and plot](https://i.stack.imgur.com/eQ0Mk.png). Change the mean marker size / shape / color with https://stackoverflow.com/q/54132989/7758804 – Trenton McKinney Mar 11 '23 at 21:19
  • `meanline=False` and `meanprops={'marker': 's', "markerfacecolor":"blue", "markeredgecolor":"blue"}` – Trenton McKinney Mar 11 '23 at 21:24

2 Answers2

3

Update

You can use scatterplot:

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('whitegrid')
fig, ax = plt.subplots(figsize=(12, 8))

df['Label'] = df['Tester'] + '\n' + df['Product']
sns.scatterplot(data=df, x='Label', y='Result', color='gray', ax=ax)
dfm = df.groupby(['Label'], as_index=False)['Result'].mean()
sns.scatterplot(data=dfm, x='Label', y='Result', color='blue', s=50, ax=ax)

ax.set_title('Individual Value Plot of Result')
ax.set_xlabel(None)

Output:

enter image description here


Old answer

You can use catplot or swarmplot:

sns.set_style('whitegrid')
sns.catplot(data=df, x='Product', y='Result', hue='Tester')

Output:

enter image description here

Update: A hack

sns.catplot(data=df.assign(Label=df['Tester'] + '\n' + df['Product']), 
            x='Label', y='Result')

Output:

enter image description here

Corralien
  • 109,409
  • 8
  • 28
  • 52
3

I would do this as a faceted plot. If you don't want it to look like there are separate plots (IMO, it makes the plot easier to parse if it does), you can remove the space between the subplots

tester_order = ["Anne", "Steve", "Paula"]

g = sns.catplot(
    data=df, kind="swarm", 
    x="Tester", y="Result", col="Product",
    color=".6", size=6, zorder=1,
    height=4, aspect=.5, order=tester_order,
    
)
g.map_dataframe(
    sns.pointplot, x="Tester", y="Result",
    join=False, errorbar=None,
    order=tester_order,
)
g.figure.subplots_adjust(wspace=0)

enter image description here

mwaskom
  • 46,693
  • 16
  • 125
  • 127