14

I'm having a problem with the Text object that matplotlib use to represent the ticklabels.

For testing reason I need to check the value of the ticks labels that are created in a plot. If the label is a string or a positive number, there is no problem: a unicode string is returned, I test it (or convert it to a number, given the circumstances) and everything is fine.

But if the label is a negative number what I get back is a mangled unicode string for a reason I cannot understand.

Let's take this example code:

import pylab as plt
fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
labels = ax.get_xticklabels()

now, if I ask the text content of the second label (the 0) I obtain a normal unicode string:

labels[1].get_text()
# u'0.0'

but the unicode of the first one (the -1) is a strange thing

labels[1].get_text()
# u'\u22121'

This is printed correctly in the terminal, but in this case I need to confront it with a numerical value, and every conversion fail, both with int and float.

I tried to convert it to an UTF-8 string with

text = labels[1].get_text()
text.encode('utf8')
# '\xe2\x88\x921'

but again it is something that is correctly printed and raise an error when converted. I also looked to the unicodedata module, but looks like it can only convert single character, so in this case is useless. I've tried also to normalize the string with unicodedata.normalize and any possible format, but again no success.

I moved to the pipy module unidecode (as suggested in Python and character normalization), again without any success

from unidecode import unidecode
unidecode(text)
# '[?]1'

I have tried also to avoid font issues using the solution in Non-ASCII characters in Matplotlib, but with the same result (I'm not sure if it should even have something to do, being that a problem of visualization...). the question Accented characters in Matplotlib has a similar problem, as it is concerned about the visualization and not the value in itself

I'm starting to feel a little lost...I know that python 2.7 has some unicode "difficulty", but normally I can avoid them in a way or the other.

I know that the issue is the minus sign, as I can avoid the problem using a brute replacement of the culprit:

text.replace(u'\u2212', '-')
# u'-1'

But this is more and hack than a solution, and I'm almost certain that it's not stable across different systems, so I would like something closer to a solution.

I'm working with

  • python 2.7.3
  • matplotlib 1.2.0
  • pylab 1.7.0
  • IPython 0.13.1

on Kubuntu 12.10.

Thank you very much for your help!

EDIT:

Corrected the order of the plot, as I got the x and y inverted, sorry

EDIT2:

a similar info is present at this link:http://www.coniferproductions.com/2012/12/17/unicode-character-dump-in-python/

in the end it shows how in some books the minus sign used is a more estetically pleasant one but not recognized by the python interpreter as a valid character.

EDIT3:

Riddle solved. the character that matplotlib return is the "MINUS SIGN", i.e. the correct typografical sign for the minus. The one the keybord create is in fact "HYPHEN-MINUS", that is commonly used but not typografically correct. see on wikipedia for an explanation http://en.wikipedia.org/wiki/Hyphen-minus.

So, the simple replace I used is in fact the correct practical thing to do, but "ethically" is a bug in python (2.7 and 3.x alike) that do not recognize the correct symbol for the minus sign.

see the bug tracking in http://bugs.python.org/issue6632

EDIT4:

to disable this behavior there is a simple solution on matplotlib, just modify the rcparams, either in the .matplotlibrc or programmatically.

import matplotlib as mpl
mpl.rcParams['axes.unicode_minus']=False
Community
  • 1
  • 1
EnricoGiampieri
  • 5,947
  • 1
  • 27
  • 26
  • 6
    Your last edit solved my problem of minus signs not rendering, thanks – Mark Dec 09 '13 at 08:45
  • 2
    I suggest you turn your EDIT4 into an answer so that people searching for it can find it more easily! – Konstantin Nov 19 '14 at 15:04
  • I only had this problem when saving as a `pdf` on the `agg` backend using the `Arial` font family. The `png` turned out fine. Any ideas why? Was the `pdf` also the culprit for you? (But your EDIT4 worked for me, thanks!) – aseagram Oct 27 '15 at 06:35
  • Your last edit should actually be an answer itself – MERose Jun 13 '18 at 14:29

2 Answers2

1

Use plt.xticks() instead of ax.get_xticklabels():

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
plt.savefig('/tmp/test.png')
loc, labels = plt.xticks()
print(type(loc))
# <type 'numpy.ndarray'>
print(loc)
# [-1.  -0.5  0.   0.5  1.   1.5  2. ]
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • +1 But they might be actually trying to check the text labels are valid – wim Mar 21 '13 at 03:20
  • I'm sorry, but this doesn't help, as the thing that I need to check, the text labels, are the same as in the other case, with the same exact problem of a strange unicode minus sign. – EnricoGiampieri Mar 21 '13 at 10:01
  • @EnricoGiampieri: What are the valid characters that you want the labels translated to? For example, is `list('01234567890.-')` the complete list? – unutbu Mar 21 '13 at 10:06
  • Yes, in this case that is the expected list. – EnricoGiampieri Mar 21 '13 at 10:40
1

All valid unicode characters have names. We can inspect the name for recognized numerical words (DIGIT.keys()) and on that basis substitute "normal" numerical characters (DIGIT.values()) for the given unicode label:

import matplotlib.pyplot as plt
import unicodedata as UD

DIGIT = {
    'MINUS': u'-',
    'ZERO': u'0',
    'ONE': u'1',
    'TWO': u'2',
    'THREE': u'3',
    'FOUR': u'4',
    'FIVE': u'5',
    'SIX': u'6',
    'SEVEN': u'7',
    'EIGHT': u'8',
    'NINE': u'9',
    'STOP': u'.'
    }

def guess(unistr):
    return ''.join([value for u in unistr
                    for key,value in DIGIT.iteritems()
                    if key in UD.name(u)])

fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
plt.savefig('/tmp/test.png')
labels = ax.get_xticklabels()
for label in labels:
    label = label.get_text()
    print(guess(label))

yields

-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677