0

How can I add labels to each step of this cumulative step function as shown in the attached screenshot?

import pandas as pd

df = pd.DataFrame({"year": range(1990, 2020, 4), "name": list("abcdefgh")})
df.year.hist(cumulative=True)

enter image description here

Janosh
  • 3,392
  • 2
  • 27
  • 35
  • @JohanC The label indicates which item it was that increased the cumulation. – Janosh Aug 02 '20 at 13:45
  • A specific example could be a list of EV charging stations in a region and when they opened. The plot is supposed to show the total number of available stations over time and display and some type of charging station identifier above each step in the function to make it clear which station newly opened at that point in time. – Janosh Aug 02 '20 at 15:14
  • You're right, a histogram isn't what I want here. It just looks similar. – Janosh Aug 02 '20 at 15:15
  • It is not a duplicate, but you might get what you need from [this question](https://stackoverflow.com/q/19073683/6692898) – RichieV Aug 02 '20 at 15:21

1 Answers1

1

You can iterate through the generated bars. Their x coordinate and width tell where they are located in the x-axis. The bar height gives the y value for the label. The positions on the x-axis will give a range that can be selected from the corresponding column of the dataframe. For the x-position of the label, the corresponding year can be used.

Some care needs to be taken, as the last value could fall outside the official range of the last bar. Also, increasing the y margin helps to fit all text into the plot.

import pandas as pd

df = pd.DataFrame({'year': range(1990, 2020, 4), 'name': [*'abcdefgh']})
ax = df.year.hist(cumulative=True)

for rect in ax.patches:
    left = rect.get_x()
    right = left + rect.get_width()
    height = rect.get_height()
    if rect == ax.patches[-1]: # make sure the last range is wide enough to include the last date
        right += 1
    df2 = df[(df['year'] >= left) &  (df['year'] <= right)]
    for year, name in zip(df2['year'], df2['name']):
        ax.text(year, height, f'{name}\n', ha='left', va='center')
ax.margins(y=0.15) # we need more room for the last label

resulting plot

PS: To only write one label per bar, you could add a break in the for loop:

    for year, name in zip(df2['year'], df2['name']):
        ax.text(year, height, f'{name}\n', ha='left', va='center')
        break

For the step plot, you could use:

import pandas as pd

df = pd.DataFrame({'year': range(1990, 2020, 4), 'name': [*'abcdefgh']})
ax = df.reset_index().plot(x='year', y='index', drawstyle="steps", legend=False)

for ind, (year, name) in enumerate(zip(df['year'], df['name'])):
    ax.text(year, ind, f' {name}', ha='left', va='center')

example step plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Thanks! Here's what I should have used instead of `df.hist`: `df.reset_index().plot(x='year', y='index', drawstyle="steps")`. – Janosh Aug 02 '20 at 16:02