0

Problem: I want to make a stacked bar chart where each value of the stack is sorted by maximum (bottom) to minimum.

I have a DataFrame that contains a time series in one direction and 12 different categories in the other, something like this:

bz =  ['NO1','NO2','NO3','NO4','NO5','DK1','DK2','FI','SE1','SE2','SE3','SE4']
df = pd.DataFrame(columns = bz, index = range(0,24,1), data=np.random.randint(0,100,size=(24, 12)))

I was unable to sort and plot the values in the same line of code so I reseorted to hard coding each hour and sorting them by highest to lowest value like so:

hour1 = df.loc[:,0].sort_values(ascending = True)
hour2 = df.loc[:,1].sort_values(ascending = True)
hour3 = df.loc[:,2].sort_values(ascending = True)
...

But then I couldn't figure out how to plot them in a stack properly.

Desired outcome:

Each category in bz is stacked and sorted by value (max at the bottom, min at the top) for each successive hour. Where each x-value is one of the variables hour1,hour2 etc.

not_speshal
  • 22,093
  • 2
  • 15
  • 30
Tom
  • 109
  • 9

2 Answers2

1

I'm not aware of any simple way to do this. That said, I was able to get the desired result (at least what I think your desired result is).

First, I created a numpy array of the sorted indices for each of the rows. Then I looped through the data frame (generally bad practice) and created the stacked bar chart. The stacking is done by first ordering the row according to the sort and computing the cumulative sum, making sure to start at 0 and ignore the last value to set the bottom of each bar. The colors are then set by the ordering, to ensure each column is colored the same every loop.

Because of this "hacky" method, you also need to create a custom legend, which I did by following the matplotlib tutorial on that. I also had to move the legend outside so it didn't block the data, which I did following this answer. For the tick marks, I set them to say the corresponding hour in the format "hourXX", where "XX" is a 0-padded number starting at 01.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np

rng = np.random.default_rng(10)

bz = ["NO1", "NO2", "NO3", "NO4", "NO5", "DK1",
      "DK2", "FI", "SE1", "SE2", "SE3", "SE4"]
df = pd.DataFrame(columns=bz,
                  index=range(0, 24, 1),
                  data=rng.integers(0, 100, size=(24, 12)))
sorted_indices = np.argsort(-df).to_numpy()

cmap = plt.get_cmap("tab20")
fig, ax = plt.subplots()
for index, row in df.iterrows():
    row = row.iloc[sorted_indices[index]]
    bottoms = np.hstack(([0], np.cumsum(row)[:-1]))
    ax.bar(index, row, bottom=bottoms, color=cmap(sorted_indices[index]))

ticks = [f"hour{n:02}" for n in range(1, len(df)+1)]
ax.set_xticks(np.arange(len(df)), ticks, rotation=90)

legend_elements = []
for index, name in enumerate(bz):
    legend_elements.append(Patch(facecolor=cmap(index), label=name))

ax.legend(handles=legend_elements, bbox_to_anchor=(1.04, 1), loc="upper left")
fig.tight_layout()
fig.show()

jared
  • 4,165
  • 1
  • 8
  • 31
0

IIUC, you need to create a new DataFrame that contains the sorted columns and then plot:

>>> pd.concat([df[col].sort_values(ignore_index=True) for col in df.columns],axis=1).plot.bar(stacked=True)

enter image description here

not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • Not quite right but nice code. I need each row sorting separately and the order of the time series, represented by each row, maintained. For example, the first hour (row 0) contains 12 entries, one for each bz, these are sorted by value, such that max is at the bottom min at the top. This is then plotted on x=0. The next hour similarly sorted and plotted so the order of the hours/rows is maintained but the bz's are plotted hierarchically for each x-value. – Tom Jul 20 '23 at 13:11
  • Your description is confusing. You saw each "row" and then say bottom to top. Rows go left to right. Do you have your expected output as a picture or table? – not_speshal Jul 20 '23 at 13:20
  • Sorry, I mean the bottom of the plot. Each row represents an hour in a day. Order each row by max to min value and then plot each row in a stacked bar chart, one for each x-value, so I can see the pattern of which bz has the largest contribution for each hour. – Tom Jul 20 '23 at 13:33
  • Furthermore, you ask for a table. If I could somehow put each of the variables I've described above i.e. `hour1, hour2`, etc. in a table were I maintain the `bz` category AND the time series value (maybe multilevel index somehow?) then this is what you should imagine. @not_speshal make more sense now? – Tom Jul 21 '23 at 08:03