Altering y-axis tick labels to minimize clutter but NOT showing data points apart from what was observed in the dataset

Question

So this is just the sample code that will generate the example visualization:


# Importing necessary libraries 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
from dateutil.parser import parse

%matplotlib inline

# Below, I just convert date strings into an actual date object.

date_strings = ['2020-01-20 03:32:44',
'2020-03-26 05:13:07',
'2020-03-26 13:32:09',
'2020-03-26 23:57:49',
'2020-03-27 15:30:00',
'2020-03-28 00:04:32',
'2020-03-28 13:26:15',
'2020-03-29 00:11:22',
'2020-04-02 00:30:00',
'2020-04-06 14:00:00']

dates = []

for date in date_strings:
    dt = parse(date)
    print(dt.date())
    dates.append(dt.date())

# finally making the graph:

x1 = np.array([x for x in range(10)])
x = dates
y = [x+60 for x in range(10)]

plt.xticks(ticks=dates,labels=dates,rotation='30')
plt.plot(x,y)
plt.scatter(x,y)

plt.show()

What I get is this:

Now this is tricky, because most conventional ways of fixing this involve including random date points in the middle. E.g. there might suddenly be a tick label at 15th Feb.

However, I don't want to include tick labels where a data point wasn't actually recorded.

Essentially, for my requirements, the conditions seem a bit stringent:

The x-axis can't have tick labels showing that don't occur in the data. (I can do this)
The first and last dates must always be shown. (I can also do this)
Any dates in between that can be shown without causing too much clutter in the x-axis, should be shown (this is the part where no existing solution seems to help me!)

Jordan Simpson · Accepted Answer · 2020-04-14T04:01:21.140

Here's my attempt at a solution.

Disclaimer, there may be a better way to do this and I just threw something together to see if I can get some clutter reduction. I don't understand the code fully but I did achieve a result.

Nonetheless, I used this post to come up with a solution and it might serve as a useful resource and better explanation for what I came up with.

Solution Code

import numpy as np
import matplotlib.pyplot as plt
from dateutil.parser import parse

date_strings = ['2020-01-20 03:32:44',
'2020-03-26 05:13:07',
'2020-03-26 13:32:09',
'2020-03-26 23:57:49',
'2020-03-27 15:30:00',
'2020-03-28 00:04:32',
'2020-03-28 13:26:15',
'2020-03-29 00:11:22',
'2020-04-02 00:30:00',
'2020-04-06 14:00:00']

dates = []

for date in date_strings:
    dt = parse(date)
    print(dt.date())
    dates.append(dt.date())

x1 = np.array([x for x in range(10)])
x = dates
y = [x+60 for x in range(10)]

plt.xticks(ticks=dates,labels=dates,rotation='90')

#solution starts
N = 10
plt.gca().margins(x=0)
plt.gcf().canvas.draw()
tl = plt.gca().get_xticklabels()
maxsize = max([t.get_window_extent().width for t in tl])
m = 0.01 # inch margin
s = maxsize/plt.gcf().dpi*N+2*m
margin = m/plt.gcf().get_size_inches()[0]

plt.gcf().subplots_adjust(left=margin, right=25.-margin)
plt.gcf().set_size_inches(s, plt.gcf().get_size_inches()[1])
plt.plot(x,y)
plt.scatter(x,y)

plt.show()

Resulting Graph

Uncluttered graph with space between x-axis ticks

Things to Note

plt.gcf().subplots_adjust(left=margin, right=25.-margin)

Changing the right parameter will change the spacing of the x-ticks. However, this current implementation comes at a trade off:

Accuracy & Spacing VS Image Width

The smaller the number, the less width the image space takes up. However, the image loses Accuracy & Spacing for representing the plot points.

I found 25 to be a good number at the cost of an expensive Image Width. I am not sure this will be an issue but thought I would mention it.

Has a small influence over the Image Width if changed alone.

The smaller the number, the smaller the image's width.

plt.xticks(ticks=dates,labels=dates,rotation='90')

[R]otation influences the spacing of the graph similar to the right parameter.

The closer the number becomes parallel with the x-axis, the worse Accuracy & Spacing, the less expensive Image Width.

The closer the number becomes parallel with the y-axis, the better Accuracy & Spacing, the more expensive Image Width.

I understand the original code had 30 for this rotation parameter. Not sure if this was important to maintain in the solution but I thought I would mention it.

I hope this helped.

That seems really creative - thanks for the extensive explanation too. — Yeahprettymuch, Apr 14 '20 at 04:16

Altering y-axis tick labels to minimize clutter but NOT showing data points apart from what was observed in the dataset

1 Answers1