I have about 50 million lists of strings in Python like this one:
["1", "1.0", "", "foobar", "3.0", ...]
And I need to turn these into a list of floats and Nones like this one:
[1.0, 1.0, None, None, 3.0, ...]
Currently I use some code like:
def to_float_or_None(x):
try:
return float(x)
except ValueError:
return None
result = []
for record in database:
result.append(map(to_float_or_None, record))
The to_float_or_None function is taking in total about 750 seconds (according to cProfile)... Is there a faster way to perform this conversion from a list of strings to a list of floats/Nones?
Update
I had identified the to_float_or_None
function as the main bottleneck. I can not find a significant difference in speed between using map
and using list comprehensions.
I applied Paulo Scardine's tip to check the input, and it already saves 1/4 of the time.
def to_float_or_None(x):
if not(x and x[0] in "0123456789."):
return None
try:
return float(x)
except:
return None
The use of generators was new to me, so thank you for the tip Cpfohl and Lattyware! This indeed speeds up the reading of the file even more, but I was hoping to save some memory by converting the strings to floats/Nones.