0

I have several large comma-separated datasets with float numbers (hundreds of thousands), but some of the values may be missing (usually none are missing, or just a few hundreds) and marked as null with the string "null". I want to convert all the values to actual float-type data and store them in a list for further processing.

I use the following code with a try-except statement (which is the more pythonic way AFAIK) to convert the strings to float numbers, and if an exception is raised I check if the exception was raised due to a "null" value. If a "null" value was responsible for the exception I just pass. Otherwise, I raise the exception. Keep in mind that in a typical dataset there will be no or very few exceptions.

try:
    num = float(string)
except ValueError:
    if string == "null"
        pass
    else:
        raise ValueError

This works pretty fine, but I realized that when once in a while I get a dataset with lots of "null" values (thousands), the code above takes ages to execute (literally several minutes or hours, while it would have been processed within seconds normally)!

Then I tried to add an if statement in order to first check if the string-value equals to "null", and only if the string is not "null" I try to convert the string with the float function. In this case, I will have many redundant string comparisons in a typical dataset with very few or no "null" values:

try:
    if string != "null":
        num = float(string)
except:
    print("String {} could not be converted to float".format(string))
    raise

With the following code-sample I measured the execution time for both ways, and to my surprise I see that when I add the additional if statement, that avoids all of the exceptions unless an unexpected non-float string other than "null" appears, my code executes much faster.

#!/usr/bin/env python

from __future__ import print_function
from timeit import Timer

mylist = "99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,-89.955946,98.5,99.983333,99.45,108.925926,99.5,99.983333,100,99.966667,98.366667,99.983333,99.983333,91.516667,99.283333,99.65,99.933333,99.983333,99.933333,99.95,99.75,99.966667,99.733333,99.966667,100,99.75,99.916667,100,99.983333,99.983333,99.233333,99.933333,99.95,99.9,99.9,99.066667,99.933333,99.966667,99.966667,99.866667,99.316667,99.883333,99.9,99.983333,99.85,98.7,99.933333,99.95,99.983333,100,99.05,100,99.866667,99.983333,99.933333,99.883333,99.983333,100,99.983333,99.983333,99.95,100,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null"

float_list = []

def tryway():
    float_list = []
    mysplitlist = mylist.split(",")
    for string in mysplitlist:
        try:
            myfloat = float(string)
            float_list.append(myfloat)
        except ValueError:
            if string == 'null':
                pass
            else:
                raise ValueError
    #print(float_list)

def ifway():
    float_list = []
    mysplitlist = mylist.split(",")
    for string in mysplitlist:
        try:
            if string != "null":
                myfloat = float(string)
                float_list.append(myfloat)
        except:
            raise
    #print(float_list)

if __name__ == '__main__':
    print("Testing Try")
    tr = Timer("tryway()","from __main__ import tryway")
    print(tr.timeit(1000))
    print("Testing If")
    ir = Timer("ifway()","from __main__ import ifway")
    print(ir.timeit(1000))

Sample execution output:

$ ./test_try_if.py 
Testing Try
2.23783922195
Testing If
0.631629943848

Can someone please explain why is that happening? What would be the best way to implement this?

Vangelis Tasoulas
  • 3,109
  • 3
  • 23
  • 36
  • Briefly: yes, exceptions are slow. – TigerhawkT3 Sep 20 '16 at 17:18
  • "Keep in mind that in a typical dataset there will be no or very few exceptions" - then why did you run your test on a dataset with so many nulls? – user2357112 Sep 20 '16 at 17:22
  • @user2357112 "but I realized that when once in a while I get a dataset with lots of "null" values (thousands), the code above takes ages to execute (literally several minutes or hours, while it would have been processed within seconds normally)" – Vangelis Tasoulas Sep 20 '16 at 17:25
  • 1
    I'm not surprised that exception handling is 3.5x slower, but that does not agree with your question saying "several minutes or hours ... within seconds normally". Is the DB itself slow (>60x slower) when it runs into a lot of nulls? – nigel222 Sep 20 '16 at 17:25
  • @nigel222 I actually read the data from the database in a batch in one go, and store it in a single string (as in the sample code). This is fast. Then I do the rest in the way that you see in the sample code. I split the string in commas, and I do the string to float conversion. A typical conversion (when not many null values) for all the data points takes several seconds (usually not more than 30 - roughly). However, when I get a dataset with thousands of null values, I have to wait for several minutes. If it happens that the null values are the majority, it may even take hours! – Vangelis Tasoulas Sep 20 '16 at 17:30
  • Well, your example has 893 nulls and 997 non-nulls which slows it down ~3.5fold. So I'd expect all nulls to be ~8 times slower than all-floats. What happens if you time the inputs `",".join(["null"]*5000)` versus `",".join(["3.45"]*5000)`? (substitute your worst-case multiple for 5000) – nigel222 Sep 20 '16 at 17:39
  • With 5000 all nulls: Testing Try: 9.87418699265, Testing If: 0.683084964752. With 5000 all float numbers: Testing Try: 2.20391392708, Testing If: 2.41577196121 – Vangelis Tasoulas Sep 20 '16 at 17:44
  • 1
    I just ran your tryway() in Python 2.7.12 interpreter with half a million nulls, and on an i3-4370 with 8G RAM, it took well under a second. So I'm pretty sure that there is some other problem at your end. Sorry, off home now. – nigel222 Sep 20 '16 at 17:46
  • Interesting... I run this on a system with four AMD Opteron 6366 HE processors (64 cores total, all pretty much idle) with 256GB RAM. OS is Ubuntu 14.04 with python 2.7.6. When I run it on my laptop with a Core i7-3537U CPU, same OS and python version, the script executes on half time. Still far from what you reported (half million nulls in under a second). – Vangelis Tasoulas Sep 20 '16 at 17:57

0 Answers0