7

The question is pretty long because of the pictures, but there isn't much content in reality. Question at the bottom.

Hi, I have a series of 30000 samples of ages ranging from 21 to 74. Series head:

0    24
1    26
2    34
3    37
4    57
Name: AGE, dtype: int64

I plot it using built-in Pandas feature .plot:

age_series = original_df['AGE']
fig = plt.figure()
fig.suptitle('Age distribution')
age_series.value_counts().sort_index().plot(kind='bar')

My problem is that it makes the x-axis not really user-friendly: Original plotting

I could increase the horizontal width between bars, but I don't want to do that. Instead, I'd like to make only a subset of the x-axis labels visible. I tried using MaxNLocator and MultipleLocator adding this line:

plt.gca().xaxis.set_major_locator(plt.MaxNLocator(10))

However, it doesn't achieve my goals, as it now incorrectly labels bars and removes ticks (which I understand since using these functions change the xticks object): MaxNLocator(10)

An ugly solution is to loop within the xticks object:

xticks = plt.gca().xaxis.get_major_ticks()
for i in range(len(xticks)):
    if i % 10 != 0:
        xticks[i].set_visible(False)

Allowing this render, which is close to what I want: enter image description here

I'm not satisfied however, as the loop is too naive. I'd like to be able to access values from the xticks (the label) and make a decision upon it, to be able to show only multiple of 10 labels.

This works (based upon this answer):

for i, l in enumerate(labels):
    val = int(l.get_text())
    if val % 10 != 0:
        labels[i] = ''
    plt.gca().set_xticklabels(labels)

Ugly workaround

Question: Is there any different solution, which feels more Pythonic/efficient ? Or do you have suggestions on how to make this data readable ?

Jean Rostan
  • 1,056
  • 1
  • 8
  • 16
  • Does this answer your question? [Changing the "tick frequency" on x or y axis in matplotlib?](https://stackoverflow.com/questions/12608788/changing-the-tick-frequency-on-x-or-y-axis-in-matplotlib) – Mr. T Feb 04 '21 at 19:46

3 Answers3

6

I think you could try something like this:

ax = plt.gca()
pos = [9,19,29,39,49]
l = [30,40,50,60,70]
ax.set(xticks=pos, xticklabels=l)
Joe
  • 12,057
  • 5
  • 39
  • 55
  • I modified it, check it now @JeanRostan Before I didnt notice that the values were starting from 21 and not from 0 – Joe Jun 12 '18 at 15:31
  • Thanks, it does work and is cleaner than the ugly loop. I accepted the answer below you as it's more generic, but thanks a lot. – Jean Rostan Jun 12 '18 at 16:05
6

To be more generic you could do something like that:

import numpy as np

ax = plt.gca()

max_value = original_df['AGE'].max()
min_value = original_df['AGE'].min()
number_of_steps = 5
l = np.arange(min_value, max_value+1, number_of_steps)

ax.set(xticks=l, xticklabels=l)
Barthelemy Pavy
  • 520
  • 3
  • 7
  • 2
    Thanks, that's what I was looking for, it's a lot cleaner than randomly looping. However, it needs a slight adjustment for the position: using `xticks=l` will make the ticks shift on the right since my starting data point is 21. Here's the fix I added: `ax.set(xticks=[x - l[0] for x in l], xticklabels=l)` – Jean Rostan Jun 12 '18 at 16:07
  • No, I keep them but I relocate them using my first real value, so they are properly placed. If you omit xticks, all the values are 1-spaced, which makes mislabelling. – Jean Rostan Jun 12 '18 at 16:19
  • Ah yes, I didn't see your solution, maybe you could do just: `plt.xticks(l)` Instead of : `ax = plt.gca() ax.set(xticks=l, xticklabels=l)` But I have nothing to try – Barthelemy Pavy Jun 12 '18 at 16:21
  • Doesn't work, labels are 5-spaced, but labels value are 1-spaced (there are 5 space between label 21 and 22), and they are shifted on the right. Thanks nonetheless, your initial solution is what I was looking for. – Jean Rostan Jun 12 '18 at 16:27
0

You could calculate all multiples of ten within your range of Ages and put it in your plot command via xticks kwarg:

age_series = original_df['AGE']

xt = np.arange(age_series.min(), age_series.max()+1)
xt = xt[xt%10==0]

fig = plt.figure()
fig.suptitle('Age distribution')
age_series.value_counts().sort_index().plot(kind='bar', xticks=xt)
SpghttCd
  • 10,510
  • 2
  • 20
  • 25