1

I have a series of text files that include numerical references. I have word tokenized them and I would like to be able to identify where tokens are numbers and convert them to integer format.

mysent = ['i','am','10','today']

I am unsure how to proceed given the immutability of strings.

cookie1986
  • 865
  • 12
  • 27

2 Answers2

3

Please try [item if not item.isdigit() else int(item) for item in mysent]

Mike DeLong
  • 338
  • 3
  • 9
1

If you try to convert a string that is not a representation of an int to an int, you get a ValueError.

You can try to convert all the elements to int, and catch ValueErrors:

mysent = ['i','am','10','today']

for i in mysent:
    try:
        print(int(i))
    except ValueError:
        continue

OUTPUT:

10

If you want to directly modify the int inside mysent, you can use:

mysent = ['i','am','10','today']

for n, i in enumerate(mysent):
    try:
        mysent[n] = int(i)
    except ValueError:
        continue

print(mysent)

OUTPUT:

['i', 'am', 10, 'today']

.isdigit() IS NOT THE SAME AS try/except!!!!

In the comments has been pointed out that .isdigit() may be more elegant and obvious. As stated in the Zen of Python, There should be one-- and preferably only one --obvious way to do it.

From the official documentation, .isdigit() Return true if all characters in the string are digits and there is at least one character, false otherwise.

Meanwhile, the try/except block catches the ValueError raised by applying int to a non-numerical string.

They may look similar, but their behavior is really different:

def is_int(n):
    try:
        int(n)
        return True
    except ValueError:
        return False

EXAMPLES:

Positive integer:

n = "42"

print(is_int(n))   --> True
print(n.isdigit()) --> True

Positive float:

n = "3.14"

print(is_int(n))   --> False
print(n.isdigit()) --> False

Negative integer:

n = "-10"

print(is_int(n))   --> True
print(n.isdigit()) --> False

u hex:

n = "\u00B23455"

print(is_int(n))   --> False
print(n.isdigit()) --> True

These are only some example, and probably you can already tell which one suits better your needs.
The discussion open around which one should be used is exhausting and neverending, you can have a look a this couple of interesting SO QA:

Gsk
  • 2,929
  • 5
  • 22
  • 29
  • 1
    I don't think you should use a `try... except` block here, especially since you are essentially ignoring the caught `ValueError`. I think it is better to see if the item is an integer/all digits using `str::isdigit`. – Thomas Sep 26 '19 at 15:30
  • @Thomas actually is a topic that has [been widely discussed on SO](https://stackoverflow.com/questions/23294658/asking-the-user-for-input-until-they-give-a-valid-response), and the `try/except` block has been proven to be the best solution. – Gsk Sep 26 '19 at 15:32
  • There is no continuous `while True` loop asking for user input in this question. Following [PEP 20](https://www.python.org/dev/peps/pep-0020/) the "obvious way to do it" in this case is to use `str::isdigit` as it seems more intuitive, is provided by the prelude, and possibly more performant. – Thomas Sep 26 '19 at 15:37
  • @Thomas There are a [lot more discussion](https://stackoverflow.com/questions/25095453/is-there-a-way-other-than-try-except-and-isdigit-to-check-user-input-in) on SO on which one should be used. I do understand that without updating the answer, some doubt may remain: I'll update the answer with proper documentation. – Gsk Sep 26 '19 at 15:40
  • @Thomas If you're still interested in this, I've updated the answer showing some of the differences between `.isdigit()` and `try/except`! – Gsk Sep 26 '19 at 16:04
  • 1
    Thank you for the edit. Yeah, the ongoing discussion about this online is pretty interesting. I would think that Python had a standard way of testing a string as an integer, but apparently not. `str::isdigit` is probably not the best solution (though is apparently fit-for-purpose for this question) because it doesn't handle negative integers, and `n.is_numeric()` with `int(float(n))` is worse. I think you're right: until Python 3.7+ comes up with a standard way of doing this, `try... except` is the best solution. – Thomas Sep 26 '19 at 16:33