4

I am using Python 3.6.3 and I encounter a weird behavior from int() and isdigit() with the following code:

s = "..... not less than 66²/ 3 % of ......"
total = 0
for c in s:
    if c.isdigit():
        total += int(c)

ValueError: invalid literal for int() with base 10: '²'

I understand the bug and I know that I can skip the error with try catch. My question is that if isdigit() return true then the char\string should be cast without error or isdigit() should return false. Otherwise said int() and isdigit() should be coherent.

Gha93
  • 155
  • 7
  • Actually, [`isdigit`](https://docs.python.org/3/library/stdtypes.html#str.isdigit) is documented to work in exactly this case – roganjosh Feb 13 '19 at 22:13
  • This is so unbelievable that it feels like a bug but it actually makes sense. Duplicate of [this](https://stackoverflow.com/questions/44891070/whats-the-difference-between-str-isdigit-isnumeric-and-isdecimal-in-python) (kind of) – Benoît P Feb 13 '19 at 22:14
  • Why do you believe this to be the case? As noted, the documentation explicitly states otherwise. – juanpa.arrivillaga Feb 13 '19 at 22:14
  • Don't know if it's viable but technically I guess you could use a regex to turn superscripts to `**n` then your algorithm my be easier to implement. Or even just in the case of superscripts use pythonic exponential notation. `**n` – Jab Feb 13 '19 at 22:16
  • @juanpa.arrivillaga I hold my hands up as guilty for having taken this method on face value. At least for me, it was too easy to feel I implicitly knew what it did; the corner case being that it wouldn't work with negative numbers for `all()` etc. Considering the rapid upvotes, I have a feeling there's a lot of code out there than can be crashed with this, explicitly documented or not :P – roganjosh Feb 13 '19 at 22:17
  • I mean technically in human terms it is a "number"... ¯\_(ツ)_/¯ – Jab Feb 13 '19 at 22:18
  • @roganjosh well, fundamentally, the correct approach is to use `try-except` here, IMO not relying on assumptions about how characters are classed (especially when you factor in unicode characters) – juanpa.arrivillaga Feb 13 '19 at 22:27
  • @juanpa.arrivillaga Devil's advocate being, what if you _wanted_ that 2 to be an integer? That's what's going on in the comments under the answer. – roganjosh Feb 13 '19 at 22:29
  • why not allow `2√(π²)/π` as a valid int then. Where does it stop. – Benoît P Feb 13 '19 at 22:31
  • @BenoîtPilatte well considering pi is not an integer, then I could safely say that this out of the realms of reasonable, even `'1.234'` can't be cast to `int` directly (but maybe a float :P ). I get the point you're making, though. – roganjosh Feb 13 '19 at 22:33
  • Good luck with your human language interpreter then... It look like [it exists](https://www.wolframalpha.com/input/?i=2√(π²)%2Fπ)... – Benoît P Feb 13 '19 at 22:42

1 Answers1

3

This is exactly as documented:

str.isdigit() Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
  • This makes it clear ... but it's very annoying ... even I tried with re.match([0-9]+, c) and I still get the same behavior ... is there a method that rejects those digits, or a method that can cast them? – Gha93 Feb 13 '19 at 22:17
  • 1
    `re.match([0-9]+, c)` should work `c in "1234567890"` too as well as `c.isdecimal()` – Benoît P Feb 13 '19 at 22:20
  • I like where @Gha93 is headed by asking if there could be a replacement for using `int` here there's gotta be a way to marry `isdigit` with a corresponding method. – Jab Feb 13 '19 at 22:22
  • @Gha93 it depends what you want to do, but `all(c in '1234567890' for c in '66²')` might suffice – Chris_Rands Feb 13 '19 at 22:25
  • @Jaba https://stackoverflow.com/questions/24391892/printing-subscript-in-python? But then I'm not sure how to use on floats. – roganjosh Feb 13 '19 at 22:26
  • yep your are right re.match works – Gha93 Feb 13 '19 at 22:27
  • 3
    Right, We should make it clear that `try: int(s); catch ValueError:` is the correct way of handling this. Some people don't like `try/catch` but it is the most Pythonic way. – Benoît P Feb 13 '19 at 22:28
  • @BenoîtPilatte It really depends what they want to do, `int('-3')` is valid of course but not all digits – Chris_Rands Feb 13 '19 at 22:29
  • `if int(s) < 0: raise ValueError("the input number should be a positive integer")` – Benoît P Feb 13 '19 at 22:34
  • the key idea is what the mean of isdigit() returning true saying yes it's a 'number' but you can't always cast it. – Gha93 Feb 13 '19 at 22:35
  • @BenoîtPilatte I think the mantra of "ask forgiveness not permission" is the better option in most situations. A tight loop where exceptions are expected to be common (most characters in that string aren't decimals) is usually not one of them. – Dunes Feb 13 '19 at 23:33