0

I am using pandas.plot to create a grouped bar chart. I'd like to overlay that chart with two different markers per data point. IRL the markers represent the same data from prior time periods. I know how to create the markers using a kind='line' and linestyle='None' but I don't know how to get them to align with the bars. The desired output would have a colored bar for each data point (from df in the MRE below) and aligned with that bar would be two distinguishable markers representing df2 and df3.

I thought of creating another bar chart that only uses markers, but AFAIK that isn't possible. I also went down a rabbit hole of using yerr as in this example: How can I add markers on a bar graph in python?

But it doesn't seem possible to produce two separate markers for the upper/lower error, which is necessary here.

Here is a simple example of what I'm trying to achieve, with the only problem being the x-alignment of the markers.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.figure import Figure

df_data = {'A':np.random.randint(20, 50, 5),
            'B':np.random.randint(10, 50, 5),
            'C':np.random.randint(30, 50, 5),
            'D':np.random.randint(30, 100, 5)}

df2_data = {'A':np.random.randint(20, 50, 5),
            'B':np.random.randint(10, 50, 5),
            'C':np.random.randint(30, 50, 5),
            'D':np.random.randint(30, 100, 5)} 

df3_data = {'A':np.random.randint(20, 50, 5),
            'B':np.random.randint(10, 50, 5),
            'C':np.random.randint(30, 50, 5),
            'D':np.random.randint(30, 100, 5)} 

df = pd.DataFrame(data=df_data)
df2 = pd.DataFrame(data=df2_data)
df3 = pd.DataFrame(data=df3_data)

fig = plt.figure()
bar_colors = ['red', 'blue', 'green', 'goldenrod']
ax = fig.gca()
df2.plot(kind='line', ax=ax, linestyle='None', marker='^',
        color=bar_colors,
        markerfacecolor='white', legend=False)
df3.plot(kind='line', ax=ax, linestyle='None', marker='o',
        color=bar_colors,
        markerfacecolor='white', legend=False)
df.plot(kind='bar', ax=ax, color=bar_colors, width=0.8, rot=0)

plt.show()
Tom
  • 1,003
  • 2
  • 13
  • 25

2 Answers2

1

With seaborn's pointplot you could use dodge= to spread the markers, similar to the bars. However, the interpretation of the dodge width to use is different for pointplot and barplot. This width can be adjusted as commented in this github issue.

There doesn't seem to be options yet to adjust the face color separate from the marker edge color. But they can be changed afterwards. Also, the legend will include all elements and can be changed to contain only the last 4.

As seaborn prefers its dataframes in "long form", melt needs to be used. As well as creating an explicit column for the original index.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df_data = {'A': np.random.randint(20, 50, 5),
           'B': np.random.randint(10, 50, 5),
           'C': np.random.randint(30, 50, 5),
           'D': np.random.randint(30, 100, 5)}
df2_data = {'A': np.random.randint(20, 50, 5),
            'B': np.random.randint(10, 50, 5),
            'C': np.random.randint(30, 50, 5),
            'D': np.random.randint(30, 100, 5)}
df3_data = {'A': np.random.randint(20, 50, 5),
            'B': np.random.randint(10, 50, 5),
            'C': np.random.randint(30, 50, 5),
            'D': np.random.randint(30, 100, 5)}
df = pd.DataFrame(data=df_data)
df2 = pd.DataFrame(data=df2_data)
df3 = pd.DataFrame(data=df3_data)

bar_colors = ['red', 'blue', 'green', 'goldenrod']
dogde_width = .8 - .8 / len(bar_colors)
ax = sns.pointplot(data=df2.melt(ignore_index=False).reset_index(),
                   x='index', y='value', hue='variable', linestyles='', markers='^',
                   palette=bar_colors, dodge=dogde_width)
sns.pointplot(data=df3.melt(ignore_index=False).reset_index(),
              x='index', y='value', hue='variable', linestyles='', markers='o',
              palette=bar_colors, dodge=dogde_width, ax=ax)
sns.barplot(data=df.melt(ignore_index=False).reset_index(),
            x='index', y='value', hue='variable',
            palette=bar_colors, dodge=True, ax=ax)
handles, labels = ax.get_legend_handles_labels()
leg_num = len(bar_colors)
ax.legend(handles[-leg_num:], labels[-leg_num:])  # use only four last elements for the legend
for markers in ax.collections:
    markers.set_facecolor('white')
    markers.set_linewidth(1)
plt.show()

seaborn barplot combined with pointplots

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • I like this answer slightly better than @PonyTale as it is a bit more flexible if the number of data points were to change. – Tom Apr 13 '21 at 18:29
  • 1
    I also changed this line to further make it more flexible ``` leg_num = -len(bar_colors) ax.legend(handles[leg_num:], labels[leg_num:]) ``` – Tom Apr 13 '21 at 18:32
1

Using matplotlibs bar() and scatter() I came up with this solution:

x = np.arange(len(df))
width = 0.2

fig, ax = plt.subplots()
# main dataframe
ax.bar(x - width *1.5,np.array(df["A"]),width,label="A")
ax.bar(x - width/2 ,np.array(df["B"]),width,label="B")
ax.bar(x + width/2 ,np.array(df["C"]),width,label="C")
ax.bar(x + width *1.5 ,np.array(df["D"]),width,label="D")

ax.set_xticks(x)
ax.legend()

# df2
ax.scatter(x - width *1.5,np.array(df2["A"]),c="black",marker="_",zorder=2)
ax.scatter(x - width/2,np.array(df2["B"]),c="black",marker="_",zorder=2)
ax.scatter(x + width/2,np.array(df2["C"]),c="black",marker="_",zorder=2)
ax.scatter(x + width *1.5,np.array(df2["D"]),c="black",marker="_",zorder=2)

# df3
ax.scatter(x - width *1.5,np.array(df3["A"]),c="black",marker="_",zorder=2)
ax.scatter(x - width/2,np.array(df3["B"]),c="black",marker="_",zorder=2)
ax.scatter(x + width/2,np.array(df3["C"]),c="black",marker="_",zorder=2)
ax.scatter(x + width *1.5,np.array(df3["D"]),c="black",marker="_",zorder=2)

This should give something like this:

enter image description here

You can experiment with different markers and colors.

PonyTale
  • 306
  • 1
  • 7