2

I was trying to plot an area graph with these values.

y1=[26.8,24.97,25.69,24.07]
y2=[21.74,19.58,20.7,21.09]
y3=[13.1,12.45,12.75,10.79]
y4=[9.38,8.18,8.79,6.75]
y5=[12.1,10.13,10.76,8.03]
y6=[4.33,3.73,3.78,3.75]

df = pd.DataFrame([y1,y2,y3,y4,y5,y6])

cumsum = df.cumsum()
cumsum

I was able to do the area part, however I don´t know how to add the specific numbers in the graph.

labels = ["Medical", "Surgical", "Physician Services", "Newborn", "Maternity", "Mental Health"]
x = [1,2,3,4]
years = [2011,2012,2013,2014]

fig, ax = plt.subplots()
plt.title("Overall, inpatient costs have decreased in 2011")
ax.stackplot(x, y1,y2,y3,y4,y5,y6, labels=labels, colors = sns.color_palette("Blues")[::-1])
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
display()

This is the current output, but does not match the desired output

enter image description here

The output should look something like this.

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
fiona
  • 89
  • 7

3 Answers3

3
  • Since there is already a DataFrame, use pandas.DataFrame.plot and kind='area'
    • However, the DataFrame needs to be constructed as shown below.
  • The question is very similar to Labels (annotate) in pandas area plot.
  • In order to properly place the annotation, the cumulative sum of the values for each x-tick must be used as the y position. Annotations can be made with .annotate or .text
    • ax.annotation(text=f'${a:0.2f}', xy=(x, cs[i]))
    • ax.text(x=x, y=cs[i], s=f'${a:0.2f}')
  • Tested in python 3.8.11, pandas 1.3.3, matplotlib 3.4.3
import pandas as pd

# create the DataFrame
values = [y1, y2, y3, y4, y5, y6]
labels = ["Medical", "Surgical", "Physician Services", "Newborn", "Maternity", "Mental Health"]
years = [2011, 2012, 2013, 2014]
data = dict(zip(labels, values))
df = pd.DataFrame(data=data, index=years)

# display(df)
      Medical  Surgical  Physician Services  Newborn  Maternity  Mental Health
2011    26.80     21.74               13.10     9.38      12.10           4.33
2012    24.97     19.58               12.45     8.18      10.13           3.73
2013    25.69     20.70               12.75     8.79      10.76           3.78
2014    24.07     21.09               10.79     6.75       8.03           3.75

# plot
ax = df.plot(kind='area', xticks=df.index, title='Overall, inpatient costs have decreased in 2011',
             color=sns.color_palette("Blues")[::-1], figsize=(10, 6), ylabel='Cost (USD)')
ax.legend(bbox_to_anchor=(1.07, 1.02), loc='upper left')  # move the legend
ax.set_frame_on(False)  # remove all the spines
ax.tick_params(left=False)  # remove the y tick marks
ax.set_yticklabels([])  # remove the y labels
ax.margins(x=0, y=0)  # remove the margin spacing

# annotate
for x, v in df.iterrows():
    cs = v.cumsum()[::-1]  # get the cumulative sum of the row and reverse it to provide the correct y position
    for i, a in enumerate(v[::-1]):  # reverse the row values for the correct annotation
        ax.annotate(text=f'${a:0.2f}', xy=(x, cs[i]))

enter image description here

  • I think a stacked bar plot is a cleaner presentation of the data because the data is discrete, not continuous. The lines in the area plot imply a continuous dataset.
ax = df.plot(kind='bar', stacked=True, color=sns.color_palette("Blues")[::-1], rot=0,
             title='Overall, inpatient costs have decreased in 2011', ylabel='Cost (USD)', figsize=(10, 6))
ax.legend(bbox_to_anchor=(1, 0.5), loc='center left', frameon=False)
ax.set_frame_on(False)  # remove all the spines
ax.tick_params(left=False, bottom=False)  # remove the x and y tick marks
ax.set_yticklabels([])  # remove the y labels

for c in ax.containers:
    
    # customize the label to account for cases when there might not be a bar section
#     labels = [f'${h:0.2f}' if (h := v.get_height()) > 0 else '' for v in c ]  # use this line with python >= 3.8
    labels = [f'${v.get_height():0.2f}' if v.get_height() > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center', fontsize=8)

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 1
    @bench You're welcome. I'm glad this works for you. Just curious, do you like the area plot or bar plot better? – Trenton McKinney Oct 09 '21 at 17:36
  • 1
    The stacked bar plot is definitely a lot more visual and easy to read. However I do like how the area plot flows and shows the increase and decreases of the cost. – fiona Oct 09 '21 at 17:44
1

You could add the following snippet at the end of your code:

for i, c in df.iteritems():
    v2 = 0
    for v in c:
        v2 += v
        ax.text(i+1, v2, f'${v:.2f}')

output:

matplotlib text

mozway
  • 194,879
  • 13
  • 39
  • 75
1

I change these lines in your code:

fig, ax = plt.subplots(figsize=(10,7))
ax.stackplot(years, y1,y2,y3,y4,y5,y6, labels=labels, colors = sns.color_palette("Blues")[::-1])
plt.legend(bbox_to_anchor=(1.1, 1), loc="upper left")

And add these lines and get what you want:

df2 = df.cumsum()

for id_col, col in df2.iteritems():
    prev_val = 0
    for val in col:
        ax.annotate(text='${}'.format(round((val - prev_val),2)), xy=(years[id_col],(val)), weight='bold')        
        prev_val = val

plt.xticks(years)

Output:

enter image description here

Whole code:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

y1=[26.8,24.97,25.69,24.07]
y2=[21.74,19.58,20.7,21.09]
y3=[13.1,12.45,12.75,10.79]
y4=[9.38,8.18,8.79,6.75]
y5=[12.1,10.13,10.76,8.03]
y6=[4.33,3.73,3.78,3.75]
labels = ["Medical", "Surgical", "Physician Services", 
          "Newborn", "Maternity", "Mental Health"]
years = [2011,2012,2013,2014]
fig, ax = plt.subplots(figsize=(10,7))
plt.title("Overall, inpatient costs have decreased in 2011", weight='bold')
ax.spines['right'].set_visible(False);ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False);ax.spines['left'].set_visible(False)
ax.stackplot(years, y1,y2,y3,y4,y5,y6, labels=labels, 
             colors = sns.color_palette("Blues")[::-1])

df2 = pd.DataFrame([y1,y2,y3,y4,y5,y6]).cumsum()
for id_col, col in df2.iteritems():
    prev_val = 0
    for val in col:
        # Base Matplotlib version use `text` or `s`
        # ax.annotate(text='${}'.format(round((val - prev_val),2)),  xy=(years[id_col],(val)) , weight='bold')   
        ax.annotate(s='${}'.format(round((val - prev_val),2)),  xy=(years[id_col],(val)) , weight='bold')        

        prev_val = val

plt.xticks(years)
plt.xlabel('Year')
plt.ylabel('Cost (USD)')
plt.legend(bbox_to_anchor=(1.1, 1), loc="upper left")
plt.show()
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • Hey, I tried the whole code and this error appeared: TypeError: annotate() missing 1 required positional argument: 's'. Do you know how to solve it? – fiona Oct 14 '21 at 16:09
  • @fiona thanks a lot, change this line :`ax.annotate(s='${}'.format(round((val - prev_val),2)), xy=(years[id_col],(val)), weight='bold')`. base on matplotlib version. – I'mahdi Oct 14 '21 at 17:02