2

After doing some research on this topic, I couldn't find an answer to this particular problem. I want to make a secondary x-axis but with categorical variables that repeat within intervals (but not repeating the same value over and over in the plot). A similar example of what I want can be seen in this picture (made with excel) :

desired result

The data:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data1 = {'Month': list(range(11,35))+list(range(34,42)),
         'Checkpoint': ['A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C','C','D','C','D','D','D','D','D','D','D'],
         'Litres':[216545.67,18034.45,25807.83,46136.23,68099.21,55436.35,56412.33,9347.52,3177.29,103.89,333.29,2355.41,
                     49063.72,113622.80,243639.97,303992.32,255471.55,267022.75,274952.92,619665.39,798969.54,1127476.60,
                     1563344.98,1051827.75,603167.32,1880605.49,1931002.19,
                     2970500.68,2362336.66,5311058.83,5071784.10,5325575.47]}

df = pd.DataFrame(data1)

By running the code above, we obtain the following dataframe

     Month   Checkpoint    Litres
0     11         A        216545.67
1     12         A        18034.45
2     13         A        25807.83
3     14         A        46136.23
4     15         A        68099.21
5     16         A        55436.35
6     17         B        56412.33
7     18         B        9347.52
8     19         B        3177.29
9     20         B        103.89
10    21         B        333.29
11    22         B        2355.41
12    23         C        49063.72
13    24         C        113622.80
14    25         C        243639.97
15    26         C        303992.32
16    27         C        255471.55
17    28         C        267022.75
18    29         C        274952.92
19    30         C        619665.39
20    31         C        798969.54
21    32         C        1127476.60
22    33         C        1563344.98
23    34         D        1051827.75
24    34         C        603167.32
25    35         D        1880605.49
26    36         D        1931002.19
27    37         D        2970500.68
28    38         D        2362336.66
29    39         D        5311058.83
30    40         D        5071784.10
31    41         D        5325575.47

I want to do a scatterplot (can be either with matplotlib or seaborn) of the data but with the 2nd x-axis (df['Checkpoint']).

plt.figure(figsize = (14,7))
plt.scatter(df['Month'], df['Litres'], s=30)
JohanC
  • 71,591
  • 8
  • 33
  • 66
Guilherme
  • 43
  • 5

1 Answers1

1

One possibility is to use the major ticks to put the month labels and the minor ticks to make separations. Whenever the checkpoint label changes, a longer tick needs to be drawn. The checkpoint label position will be just between two long ticks.

There seems to be one month with two different labels (month 34). It is not clear what has to happen then. In the code below a long major tick is drawn there.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

data1 = {'Month': list(range(11, 35)) + list(range(34, 42)),
         'Checkpoint': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
                        'C', 'C', 'C', 'C', 'D', 'C', 'D', 'D', 'D', 'D', 'D', 'D', 'D'],
         'Litres': [216545.67, 18034.45, 25807.83, 46136.23, 68099.21, 55436.35, 56412.33, 9347.52, 3177.29, 103.89,
                    333.29, 2355.41, 49063.72, 113622.80, 243639.97, 303992.32, 255471.55, 267022.75, 274952.92,
                    619665.39, 798969.54, 1127476.60, 1563344.98, 1051827.75, 603167.32, 1880605.49, 1931002.19,
                    2970500.68, 2362336.66, 5311058.83, 5071784.10, 5325575.47]}
df = pd.DataFrame(data1)

fig, ax = plt.subplots(figsize=(10, 5))

ax.scatter(df['Month'], df['Litres'], s=30, color='crimson')

ax.xaxis.set_major_locator(ticker.FixedLocator(0.5))
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.5))
ax.set_xlim(df['Month'].iloc[0] - 0.5, df['Month'].iloc[-1] + 0.5)

checkpoints = list(df['Checkpoint'])

long_minor_ticks = [df['Month'].iloc[0] - 1]  # these minor ticks need to be longer
long_major_ticks = []  # these major ticks need to be longer

for m1, m, c1, c in zip(df['Month'][1:], df['Month'], df['Checkpoint'][1:], df['Checkpoint']):
    if m == m1:
        long_major_ticks.append(m)
    elif c != c1:
        long_minor_ticks.append(m)
long_minor_ticks.append(df['Month'].iloc[-1])

ax.tick_params(which='minor', axis='x', pad=20) # put the minor tick labels at some distance

checkpoint_labels = []
for tick, month in zip(ax.xaxis.get_minor_ticks(), range(df['Month'].iloc[0]-1, 100)):
    l = 35 if month in long_minor_ticks and not month in long_major_ticks and not month+1 in long_major_ticks else 18
    tick.tick1line.set_markersize(l)
    checkpoint_labels.append('')

for tick, month in zip(ax.xaxis.get_major_ticks(), range(df['Month'].iloc[0]-1, 100)):
    l = 35 if month in long_major_ticks else 0
    tick.tick1line.set_markersize(l)

# set the checkpoint letters at the positions between the long minor ticks
for t0, t1 in zip(long_minor_ticks[:-1],long_minor_ticks[1:]):
    if t1 != t0 + 1:
        ind = (t1+t0) // 2 - long_minor_ticks[0]
        checkpoint_labels[ind] = df['Checkpoint'].iloc[ind]
ax.set_xticklabels(checkpoint_labels, minor=True)

fig.subplots_adjust(bottom=0.15) # we need space to show the large ticks
plt.show()

resulting plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Hi, JohanC, thanks a lot for your answer and for your time to help me with this. I will test it now on my dataset and come back here to tell you if it worked for the many cases I have on the dataset. Once again, thanks! – Guilherme Mar 05 '20 at 11:27
  • Hi, Johan, when I plot your code in mine, I only get the long tick only pops up in the number 33 (not 34), and it doesn't show on the other ones. I can't find the issue with that. I'm using jupyter – Guilherme Mar 06 '20 at 11:45
  • Hi. Since I'm doing this at work, I had a few issues installing the newest matplotlib version, but now it is on, matplotlib 3.2.0. Sorry for the delay. When running your code, I obtain the following image: https://imgur.com/a/zv3dQDk And the tick that appears should be in fact in 34 instead of 33 – Guilherme Mar 09 '20 at 11:53
  • I am testing each line of the code and it seems like that all the ticks go away when I run this line: `ax.tick_params(which='minor', axis='x', pad=20) # put the minor tick labels at some distance` Although without it, the A B C labels pop up on the numbers. The label that should be on 34 is still on 33 – Guilherme Mar 09 '20 at 11:59
  • JohanC, I've ran the code line by line and I identified that the 'pad = 20' parameter might be the issue, once it is declared, the long major ticks go away and only the 33 one sticks around. I've fixed the 34 label popping up in 33, I just added '-1' in this line `for tick, month in zip(ax.xaxis.get_major_ticks(), range(df['Month'].iloc[0]-1, 100)):` – Guilherme Mar 09 '20 at 18:08
  • I now changed the order of the changes. Does this work for you? A problem might be that some commands reset the ticks, so `tick.tick1line.set_markersize(...)`should be done as late as possible. – JohanC Mar 09 '20 at 19:23
  • It worked! Thanks a lot JohanC. I am now going to try adapting the code to a general case, since I took only a part (11 to 40) of the dataset (1 to 68) to post here. When I use it with other intervals it doesn't work properly, but that's certainly the math behind it. Once again, thanks a lot. – Guilherme Mar 10 '20 at 12:23
  • It is hard to know without seeing the data. The excel example only uses large ticks between the numbers, so that one is simpler. – JohanC Mar 10 '20 at 17:42
  • Oh yes, definitely, I'll upvote it. My data still has some problematic points that need fixing, so I'll fix it and then try to apply your algorithm. Thanks – Guilherme Mar 10 '20 at 19:41