3

After parsing a file, I obtain a list of strings containing numerical values, let's say:

my_list = ['1', '-2.356', '00.57', '0', '-1', '02678', '0.005367', '0', '1']

In order to obtain these numerical values, I do the following:

new_list = [float(i) for i in my_list] . 

The problem is that the integer values - which are a majority in the file I am handling, are also converted to float and thus occupy more memory - let alone other issues (I have to use some of them as indexes - thus they need to be converted to int at some point..)

Is there an efficient way to convert from string to float only those needed (I must not lose any precision) and to integer all the other ones?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Matina G
  • 1,452
  • 2
  • 14
  • 28
  • 1
    You could see if the string contains a `'.'`, use that to decide whether you want `float(i)` or `int(i)`. What about floats that can be exactly represented as ints, like `1.0`? – jonrsharpe Nov 24 '17 at 12:50
  • Also see https://stackoverflow.com/questions/379906/parse-string-to-float-or-int – PM 2Ring Nov 24 '17 at 13:11

2 Answers2

4

You can built a function that converts the strings to integers if possible and floats otherwise (and if even that fails, leave them as strings).

my_list = ['1', '-2.356', '00.57', '0', '-1', '02678', '0.005367', '0', '1']

def converter(n):
  try:
    return int(n)
  except ValueError:
    try:
      return float(n)
    except ValueError:
      return n  # <- left as string

print([converter(x) for x in my_list])  # -> [1, -2.356, 0.57, 0, -1, 2678, 0.005367, 0, 1]

This works because int('2.3') is not the same as int(2.3). The first returns an Error while the second, clips the float and returns 2.

Also note that the order in which the try blocks are arranged is very important since float('2') does work. As a result, casting to int has to be tried first!

Ma0
  • 15,057
  • 4
  • 35
  • 65
  • 1
    `+one` since you also handled string ( if `int` and `float` fails) – akash karothiya Nov 24 '17 at 13:02
  • I was going to close this question as a dupe, but I can't find a good target that gives priority to this technique, which I agree is the best way to do it, unless you can guarantee that all the strings are simple, i.e., no strings like '12E3' or 'Nan'. – PM 2Ring Nov 24 '17 at 13:16
  • + one for seeing first that it was not trivial, and proposing a good approach. – Reblochon Masque Nov 24 '17 at 13:29
2

The difference between int and float in your list is whether or not there is a ., you can use that to choose what casting to use.

new_list = [float(elt) if '.' in elt else int(elt) for elt in my_list] 

edit:

In order to handle float special cases, the converter function proposed by @EvKounis is a good idea; expanding on it after @PM2Ring remark:

my_list = ['1', '1E3', '-inf', 'inf', 'NaN', 'nan', '-2.356', '00.57', '0', '-1', '02678', '0.005367', '0', '1', '398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857']

def convert_to_int_or_float(elt):
    try:
        e = float(elt)
    except ValueError:
        raise ValueError
    elt = elt.lower()
    if '.' in elt or 'e' in elt or 'inf' in elt or 'nan' in elt:
        pass
    else:
        e = int(elt)
    return e

[convert_to_int_or_float(e) for e in my_list]

output:

[1,
 1000.0,
 -inf,
 inf,
 nan,
 nan,
 -2.356,
 0.57,
 0,
 -1,
 2678,
 0.005367,
 0,
 1,
 398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857398472398472943657410843104729572471308172374018301478744723974523987452938729847194719841471476574572394710481048075434810482398752481038185739847239847294365741084310472957247130817237401830147874472397452398745293872984719471984147147657457239471048104807543481048239875248103818573984723984729436574108431047295724713081723740183014787447239745239874529387298471947198414714765745723947104810480754348104823987524810381857]
Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
  • Let's hope the OP's data doesn't contain floats in scientific notation without a decimal point, eg '12E3'. – PM 2Ring Nov 24 '17 at 13:08
  • Now it handles `NaN`, `-inf`, `inf`, and scientific notation, thanks for demanding more. – Reblochon Masque Nov 24 '17 at 13:28
  • And now integers that have a value larger that the maxval of floats are also handled properly. – Reblochon Masque Nov 24 '17 at 13:44
  • Ev. Kounis's technique handles all the "funny" floats without needing specific tests. But anyway, you could simplify those tests a little. Eg: `elt = elt.lower()` `if any(u in elt for u in ('.', 'e', 'inf', 'nan')): pass` – PM 2Ring Nov 24 '17 at 13:44
  • It does indeed. When I replace the case by case test with your any(statement) the code breaks for very large ints: it returns `float('inf')` instead. I have no clue why, but I've been in front of my screen for far too long to look into it for now. – Reblochon Masque Nov 24 '17 at 13:57
  • Weird. If `elt` represents a very large int then `e = float(elt)` will succeed, returning `float('inf')`, as you said. But the `e = int(elt)` should also be executed, and that's what happens on my machine (running Python 3.6.0), using your `if` test or my version. – PM 2Ring Nov 24 '17 at 14:46