dealing with bad values when extracting data in python

Question

I have the following helper function that works out the total steps from a multple list of readings

eg ['40', '1571', '1366', '691', '947', '1947', '108', '132', '950', '1884']

def data_check(readings, total):

for reading in readings:

    steps = int(reading)
    total = total + steps

return total

I need to alter the program so it can deal with bad data, such as strings or negative values, and adds zero to the total when a bad value occurs.

e.g ['40', '1571', '13vgs6', '-5', '947']

my solution was as follows, which doesnt work the total out properly gives me a division by zero error later in the program:

def data_check(readings, total):
""" checks to see if data is good format and returns total """
print(readings)

for reading in readings:


    if type(reading) == int:

        steps = int(reading)

    else:

        steps = 0


    total = total + steps


return total

perhaps i am using a bad approach and shoudl maybe have a helper function that replaces all bad values in the list with zeros before hand?

What is your desired output after passing `['40', '1571', '13vgs6', '-5', '947']` to your function? — Ajax1234, Sep 27 '17 at 01:19
See my answer, but in short, `type(...)` will *always* return `str` in this case because your argument is a list of all `str`ings. — hyper-neutrino, Sep 27 '17 at 01:20

hyper-neutrino · Answer 1 · 2017-09-27T01:29:38.607

1

`type`

The type of an object is kind of like its class; you can't check if a string represents an integer this way like that.

`try` and `except`

You can do this:

try:
    value = int(reading)
except: # parsing to integer failed
    value = 0

`re` regular expressions

You can also import re and check for truthiness of re.match(r"\d+$", reading) to determine if it represents an integer. The following regular expressions work:

Type             | Regex
Positive Integer | \d+$
Integer          | -?\d+$
Positive Real    | (\d+|\d*\.\d+)$
Real             | -?(\d+|\d*\.\d+)$

Making your code this:

if re.match(r"\d+", reading): # or whichever regex you choose
    value = int(reading) # or float(reading)
else:
    value = 0

`str` functions

Strings have a function to determine if they're all numeric. str.isdigit(string) works for positive integers, so you can do this:

if reading.isdigit():
    value = int(reading)
else:
    value = 0

Maybe more pythonically would be value = int(reading) if reading.isdigit() else 0 or just value = reading.isdigit() and int(reading); the second is shorter but less readable.

edited Sep 27 '17 at 01:29

answered Sep 27 '17 at 01:19

hyper-neutrino

5,272
2
29
50

thank you very much, both your suggestions work great. – Jay Dilla Sep 27 '17 at 01:26
@JayDilla Also something I forgot to mention (editing answer now): put a dollar sign (`$`) after the regular expression to force it to match the entire string – hyper-neutrino Sep 27 '17 at 01:29
yeaip cheers. I ended up just using the Try/accept method then added a line like to check if the value is positive ( if step < 0: step =0, else step = step). The course i am doing also uses pylint and you are required to be specific about exceptions so i added 'except ValueError'. – Jay Dilla Sep 27 '17 at 02:10
@JayDilla Good idea; I would still recommend using `str.isdigit` though; throwing errors for testing values is probably not advised! ;) – hyper-neutrino Sep 27 '17 at 02:11

score 0 · Answer 2 · answered Sep 27 '17 at 01:22

0

From this answer you can look for the ascii index of that character "ord(char)" then, if it's in the range "48-57" (range of number) then chase back by "chr(index)"

answered Sep 27 '17 at 01:22

Duc Anh Nguyen

128
7

score 0 · Answer 3 · answered Sep 27 '17 at 01:23

0

For the total number of integer values, you can try this:

import re
s = ['40', '1571', '13vgs6', '-5', '947']
integers = len([i for i in s if re.findall("^-*\d+$", i)])
print(integers)

Output:

answered Sep 27 '17 at 01:23

Ajax1234

69,937
8
61
102

why are you using `findall`? Just use `match` – hyper-neutrino Sep 27 '17 at 01:29
@HyperNeutrino `findall` makes it easier to use "truthy" values to determine a match. In this case, if a match is not found, an empty list is created. A comparison with an empty list will return `False`, which can easily be implemented in a list comprehension. – Ajax1234 Sep 27 '17 at 01:33
@HyperNeutrino `match` can be used, however, it will always try to match from the start of the string by default, and that can be an issue when solving certain problems. – Ajax1234 Sep 27 '17 at 01:57
I am aware of that, and `search` fixes the issue, but here the desired outcome is for the match of digits to start and the beginning and end at the termination of the string (like your `^` and `$` do), and `match` works nicely for this. I *think* `match` is faster but I'm not quite sure. – hyper-neutrino Sep 27 '17 at 02:00
@HyperNeutrino I believe that at worst case senario, both would be the same since that outcome would require a match to be found starting at the beginning of the string and extending to the very end; however, I could be wrong. – Ajax1234 Sep 27 '17 at 02:05
Unless `findall` optimizes on `^` to bind to the start (which it probably does), it might go through all positions. I'm not sure how regex is implemented though. – hyper-neutrino Sep 27 '17 at 02:06

score -1 · Answer 4 · answered Sep 27 '17 at 01:25

-1

You could use the "isinstance()" built-in function.

def numberofints(listofvalues):
   integers = 0
   for i in listofvalues:
      if isinstance(i, int): integers += 1
   return integers

answered Sep 27 '17 at 01:25

ArthurQ

118
1
8

I downvoted because `type(i) == int` does the same thing here and the actual problem lies in the fact that the list is a list of `str`ings... And also your code, if it worked, doesn't do the task wanted anyway... – hyper-neutrino Sep 27 '17 at 01:30

dealing with bad values when extracting data in python

4 Answers4

type

try and except

re regular expressions

str functions

`type`

`try` and `except`

`re` regular expressions

`str` functions