0

So this is just the sample code that will generate the example visualization:


# Importing necessary libraries 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
from dateutil.parser import parse

%matplotlib inline

# Below, I just convert date strings into an actual date object.

date_strings = ['2020-01-20 03:32:44',
'2020-03-26 05:13:07',
'2020-03-26 13:32:09',
'2020-03-26 23:57:49',
'2020-03-27 15:30:00',
'2020-03-28 00:04:32',
'2020-03-28 13:26:15',
'2020-03-29 00:11:22',
'2020-04-02 00:30:00',
'2020-04-06 14:00:00']

dates = []

for date in date_strings:
    dt = parse(date)
    print(dt.date())
    dates.append(dt.date())

# finally making the graph:

x1 = np.array([x for x in range(10)])
x = dates
y = [x+60 for x in range(10)]

plt.xticks(ticks=dates,labels=dates,rotation='30')
plt.plot(x,y)
plt.scatter(x,y)

plt.show()

What I get is this:

img

Now this is tricky, because most conventional ways of fixing this involve including random date points in the middle. E.g. there might suddenly be a tick label at 15th Feb.

However, I don't want to include tick labels where a data point wasn't actually recorded.

Essentially, for my requirements, the conditions seem a bit stringent:

  • The x-axis can't have tick labels showing that don't occur in the data. (I can do this)
  • The first and last dates must always be shown. (I can also do this)
  • Any dates in between that can be shown without causing too much clutter in the x-axis, should be shown (this is the part where no existing solution seems to help me!)
Yeahprettymuch
  • 501
  • 5
  • 16

1 Answers1

2

Here's my attempt at a solution.

Disclaimer, there may be a better way to do this and I just threw something together to see if I can get some clutter reduction. I don't understand the code fully but I did achieve a result.

Nonetheless, I used this post to come up with a solution and it might serve as a useful resource and better explanation for what I came up with.

Solution Code

import numpy as np
import matplotlib.pyplot as plt
from dateutil.parser import parse

date_strings = ['2020-01-20 03:32:44',
'2020-03-26 05:13:07',
'2020-03-26 13:32:09',
'2020-03-26 23:57:49',
'2020-03-27 15:30:00',
'2020-03-28 00:04:32',
'2020-03-28 13:26:15',
'2020-03-29 00:11:22',
'2020-04-02 00:30:00',
'2020-04-06 14:00:00']

dates = []

for date in date_strings:
    dt = parse(date)
    print(dt.date())
    dates.append(dt.date())

x1 = np.array([x for x in range(10)])
x = dates
y = [x+60 for x in range(10)]

plt.xticks(ticks=dates,labels=dates,rotation='90')

#solution starts
N = 10
plt.gca().margins(x=0)
plt.gcf().canvas.draw()
tl = plt.gca().get_xticklabels()
maxsize = max([t.get_window_extent().width for t in tl])
m = 0.01 # inch margin
s = maxsize/plt.gcf().dpi*N+2*m
margin = m/plt.gcf().get_size_inches()[0]

plt.gcf().subplots_adjust(left=margin, right=25.-margin)
plt.gcf().set_size_inches(s, plt.gcf().get_size_inches()[1])
plt.plot(x,y)
plt.scatter(x,y)

plt.show()

Resulting Graph

Uncluttered graph with space between x-axis ticks

Things to Note

plt.gcf().subplots_adjust(left=margin, right=25.-margin)

Changing the right parameter will change the spacing of the x-ticks. However, this current implementation comes at a trade off:

Accuracy & Spacing VS Image Width

The smaller the number, the less width the image space takes up. However, the image loses Accuracy & Spacing for representing the plot points.

I found 25 to be a good number at the cost of an expensive Image Width. I am not sure this will be an issue but thought I would mention it.

m

Has a small influence over the Image Width if changed alone.

The smaller the number, the smaller the image's width.

plt.xticks(ticks=dates,labels=dates,rotation='90')

[R]otation influences the spacing of the graph similar to the right parameter.

The closer the number becomes parallel with the x-axis, the worse Accuracy & Spacing, the less expensive Image Width.

The closer the number becomes parallel with the y-axis, the better Accuracy & Spacing, the more expensive Image Width.

I understand the original code had 30 for this rotation parameter. Not sure if this was important to maintain in the solution but I thought I would mention it.

I hope this helped.