0

Disclaimer: perhaps my research was lacking, but I didn't find an answer exactly tailored to my question.

I have a set of datasets with specific sizes which were made available in different years. For example, in 2005 we had dataset A (size 500), B (size 100) and C (size 789) and then in 2013 we had dataset H (size 1500), I (size 300) and J (size 47).

I would like to have the size on the vertical YY axis while the horizontal XX axis would be hierarchical: first ordered by year and within each year it would be ordered by size.

Additionally, I would like to display the XX labels as a string formatted as "YYYY dataset_name". For the example above, the XX axis would have the following labels in this order:

2005 B-100

2005 A-500

2005 C-789

2013 J-47

2013 I-300

2013 H-1500

From the examples I found, it is usually assumed that the same elements are present in every group (if that was the case, each dataset name would appear once for each year - but that is not what I want).

Consider this example from The Python Graph Gallery:

# libraries
import numpy as np
import matplotlib.pyplot as plt

# set width of bar
barWidth = 0.25

# set height of bar
bars1 = [12, 30, 1, 8, 22]
bars2 = [28, 6, 16, 5, 10]
bars3 = [29, 3, 24, 25, 17]

# Set position of bar on X axis
r1 = np.arange(len(bars1))
r2 = [x + barWidth for x in r1]
r3 = [x + barWidth for x in r2]

# Make the plot
plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='var1')
plt.bar(r2, bars2, color='#557f2d', width=barWidth, edgecolor='white', label='var2')
plt.bar(r3, bars3, color='#2d7f5e', width=barWidth, edgecolor='white', label='var3')

# Add xticks on the middle of the group bars
plt.xlabel('group', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(bars1))], ['A', 'B', 'C', 'D', 'E'])

# Create legend & Show graphic
plt.legend()
plt.show()

It produces the following plot:

enter image description here

My plot would be something like this. However, not all variables would be in every category. It would look like this:

enter image description here

The years would appear where the single bold letters are. The text over the years wouldn't necessarily be numbers, they could be the dataset names.

I would need the groups, however, to be equally-spaced as shown.

Any help appreciated.

Blitzkoder
  • 1,768
  • 3
  • 15
  • 30
  • Is this the actual structure of your pandas dataframe - one column year as a number (2005), one column category and value as a string ("B-100")? Then you have to first create a dataframe that matplotlib can work with. There are several solutions regarding multiindex labeling [here](https://stackoverflow.com/a/43547282/8881141) or [here](https://stackoverflow.com/q/19184484/8881141) for instance. And there is this example for [color coding of categories.](https://stackoverflow.com/a/31950461/8881141) – Mr. T Jul 17 '18 at 19:05
  • @Mr.T I believe [this example you provided](https://stackoverflow.com/questions/19184484/how-to-add-group-labels-for-bar-charts-in-matplotlib) looks like what I need the most. Except that for my case, all bars would have different colors and there would be no "Shelf" category subdivision, there are only two levels in my case: first the year, then the dataset name. – Blitzkoder Jul 17 '18 at 20:35

1 Answers1

0

I actually solved the problem some time ago, but only now am I posting the solution. Essentially, I used another StackOverflow user's approach to hierarchical bar plotting:

def mk_groups(data: Dict) -> List:
    try:
        newdata = data.items()
    except:
        return

    thisgroup = []
    groups = []
    for key, value in newdata:
        newgroups = mk_groups(value)
        if newgroups is None:
            thisgroup.append((key, value))
        else:
            thisgroup.append((key, len(newgroups[-1])))
            if groups:
                groups = [g + n for n, g in zip(newgroups, groups)]
            else:
                groups = newgroups
    ret_val = [thisgroup] + groups

    return ret_val

def add_line(ax, xpos: float, ypos: float) -> None:
    line = plt.Line2D([xpos, xpos], [ypos + .1, ypos],
                      transform=ax.transAxes, color='black')
    line.set_clip_on(False)
    ax.add_line(line)

def get_hirarchy_element_count(data: Dict) -> int:
    acc = 0
    for _, val in data.items():
        if isinstance(val, collections.Mapping):
            sub_groug_count = get_hirarchy_element_count(val)
            acc = acc + sub_groug_count
        else:
            acc = acc + 1
    return acc

def get_group_color_list(data: Dict, colors: List) -> List:
    acc = 0
    ind = 0
    new_colors = []
    for _, val in data.items():
        if isinstance(val, collections.Mapping):
            sub_groug_count = get_hirarchy_element_count(val)
            new_colors = new_colors + [colors[ind]] * sub_groug_count
            ind = ind + 1
        else:
            acc = acc + 1
    return new_colors

With these functions, one may simply define a dictionary of dictionaries and call these functions like so:

plot_map = {
    2004: {'dataset A': 50,'dataset B': 30,'dataset C': 70,'dataset ZZZ': 10,},
    2007: {'dataset 111': 80,'dataset B3': 5},
    2010: {'dataset solitude': 40},
    2015: {
        'Group A': {'x' : 40, 'y': 60}, 
        'Group B': {'x' : 45, 'y': 45}
    }
}

fig = plt.figure(figsize=(12, 6))
plt.suptitle('Each group with its own color')
ax = fig.add_subplot(1,1,1)

label_group_bar(ax, plot_map, per_group_coloring = True)
fig.subplots_adjust(bottom=0.3)
plt.xticks(rotation=90)
plt.ylabel("Yet another scale")

plt.show()

This produces a figure where each group of bars has its own color:

enter image description here

However, if you want each bar to have its own color, you could do the following:

fig = plt.figure(figsize=(12, 6))
plt.suptitle('Each bar with its own color')
ax = fig.add_subplot(1,1,1)

label_group_bar(ax, plot_map)
fig.subplots_adjust(bottom=0.3)
plt.xticks(rotation=90)
plt.ylabel("Yet another scale")

plt.show()

enter image description here

I hope this is useful to someone else.

Blitzkoder
  • 1,768
  • 3
  • 15
  • 30