1

I have a pandas DataFrame that looks like this, and I'm using it to graph the life of a character over period of days. The days column is really "days since birth." For this example, the character was born on May 26th, 2023.

   days  health  months
0     0    30    May 23
1     1    30    
2     2    20    
3     3    20    
4     4    10    
5     5    10    
6     6    10    Jun 23
7     7    10    
8     8    10    
9     9     0    

This is the seaborn BarPlot.

enter image description here

I have significantly simplified the number of days the character is alive for the sake of reproducibility, but in my normal code, the number of days is in the hundreds, possibly thousands.

Here is a graph of my normal case.

enter image description here

As you can see, this graph is much more overloaded with bars, which seems to be impacting performance pretty negatively, with only a few hundred days.

So my question is this: can I convert the BarPlot to the seaborn equivalent of a histogram with the way my DataFrame is set up?

The ideal would look something like the image below (ignore my bad graphic design job), The red lines are only to highlight each section of the histogram. I am not looking to add those red lines.

Note, I also need to be able to keep the month labels in the same place as they are above, since there could be a section of time where the character's health stays the same for multiple months.

enter image description here

My code is minimal for the chart, but the size of the DataFrame seems to be causing the slow rendering time.

ax = sns.barplot(dataframe, x='days', y='health', color='blue')
ax.set_xticklabels(dataframe.months)

plt.xticks(rotation=45)
plt.show()

Here's a line for the example DataFrame, for easy reproducibility:

df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})

Thank you in advance.

rpanai
  • 12,515
  • 2
  • 42
  • 64
SanguineL
  • 1,189
  • 1
  • 7
  • 18
  • 2
    It seems like it supposed to be a continuous x-axis, in which case, this should be a line plot. – Trenton McKinney May 25 '23 at 17:09
  • 1
    Ah! I was only looking for histogram options. The answer was right in front of me. Thanks, @TrentonMcKinney! – SanguineL May 25 '23 at 17:14
  • Also going to add this comment: Since I had `days` as my x-value, I had hundreds of `xticklabels`, which seems to have impacted my rendering speed even more than just using the boxplot method. Changed my `df` so empty labels would be `""` instead of `" "` and this helped immensely. – SanguineL May 25 '23 at 19:54

2 Answers2

2
  • It seems like it's supposed to be a continuous x-axis, in which case, this should be a line plot.
  • You don't want to plot strings on the x-axis. This results in a tick for every string because the labels are categorical.
    • Combine 'months' and 'days', add a year, and create a datetime Dtype column to use as the x-axis.
  • pandas will format the ticklabels differently based on the range of the dates.
  • Customize the exact look and location of the datetime axis, by looking at the many questions on SO dealing with formatting datetime xtick labels.
  • Tested in python 3.11.2, pandas 2.0.1, matplotlib 3.7.1
import pandas as pd

# sample dataframe
df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})

# replace the strings with NA, so forward fill will work
df = df.replace(' ', pd.NA)

# fill the months column, not empties
df.months = df.months.ffill()

# convert to a datetime with a specific year
df['date'] = pd.to_datetime('2023 ' + df.months, format='%Y %b %d')

# update the date column by adding the days as an offset
df.date = df.apply(lambda v: v.date + pd.Timedelta(days=v.days), axis=1)

# plot
ax = df.plot(x='date', y='health', rot=0, figsize=(10, 5))

# if the xticks labels need to be repositioned horizontally get the ticks and labels
ticks, labels = list(zip(*[(v.get_position()[0], v.get_text()) for v in ax.get_xticklabels()]))

# reset the ticks and labels
ax.set_xticks(ticks, labels, ha='center')

enter image description here

df

   days  health  months       date
0     0      30  May 23 2023-05-23
1     1      30  May 23 2023-05-24
2     2      20  May 23 2023-05-25
3     3      20  May 23 2023-05-26
4     4      10  May 23 2023-05-27
5     5      10  May 23 2023-05-28
6     6      10  Jun 23 2023-06-29
7     7      10  Jun 23 2023-06-30
8     8      10  Jun 23 2023-07-01
9     9       0  Jun 23 2023-07-02

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   days    10 non-null     int64         
 1   health  10 non-null     int64         
 2   months  10 non-null     object        
 3   date    10 non-null     datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 452.0+ bytes
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
1

You can also define the width:

ax = sns.barplot(df, x='days', y='health', color='blue', width=1)
ax.set_xticklabels(df.months)

plt.xticks(rotation=45)
plt.show()

Output:

enter image description here

Another way with lineplot as suggested by @TrentonMcKinney

ax = sns.lineplot(df, x='days', y='health', color='blue', drawstyle='steps-pre')
ax.set_xticklabels(df.months)
ax.fill_betweenx(df['health'], df['days'], color='blue', step='post')

plt.xticks(rotation=45)
plt.show()

Output:

enter image description here

Corralien
  • 109,409
  • 8
  • 28
  • 52