1

I have a "mixed list" (here meaning, a list that may include lists, dicts, strings, ints or floats), and I would like to print it - that is, obtain a string representation of it, hopefully a "pretty" one - however, in such a way, that the floats in this data structure have their number of decimals limited. Then, in principle, I might want to save this string to a file, and load it again.

As a rule of thumb, I'd like all values with absolute value > 0.01 to be formatted with two decimals only, and the rest, formatted with scientific notation.

Looking at some of the SO posts, I managed to come up with the following example (works with Python 2.7.16 and Python 3.7.4 on MSYS2, Windows 10):

#!/usr/bin/env python

import math
import pprint

# https://stackoverflow.com/questions/1447287/format-floats-with-standard-json-module
import json
from json import encoder
encoder.FLOAT_REPR = lambda o: format(o, '.2f')

# https://stackoverflow.com/questions/1447287/format-floats-with-standard-json-module
def round_floats(o):
  if isinstance(o, float): return "{:.2f}".format(o) if abs(o)>0.01 else "{:.2e}".format(o)
  if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
  if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]
  return o

import collections
try: # https://stackoverflow.com/questions/53978542/how-to-use-collections-abc
  import collections.abc
  collectionsAbc = collections.abc
except (ImportError, AttributeError) as e:
  collectionsAbc = collections
import numbers

# https://stackoverflow.com/questions/7076254/rounding-decimals-in-nested-data-structures-in-python
def fpformat(thing, formatfunc):
  if isinstance(thing, dict):
    try: # Python 2
      thingiter = thing.iteritems()
    except: # Python 3
      thingiter = thing.items()
    return type(thing)((key, fpformat(value, formatfunc)) for key, value in thingiter)
  if isinstance(thing, collectionsAbc.Container):
    return type(thing)(fpformat(value, formatfunc) for value in thing)
  if isinstance(thing, numbers.Number):
    return formatfunc(thing)
  return thing
def formatfloat(thing):
  return "%.3g" % float(thing)

#############

# make a source array, mixed data

tarr = [
  ["aa",         "bb",        "cc",        "dd",        "ee"          ],
  [ {'v': 1.1},  {'w': 2.2},  {'x': 3.3},  {'y': 4.4},  {'z': 5.5555} ],
  [ 10,          20,          30,          40,          50            ],
  [ 11.1,        22.22,       33.333,      44.4444,     55.55555      ]
]

# create some more decimals:
appendrow = []

for ind, tnum in enumerate(tarr[2]):
  tpnum = ((ind+1.0)/(ind+2.0))*math.pi*tnum
  appendrow.append(tpnum)

tarr.append(appendrow)

appendrow = []

for ind, tnum in enumerate(tarr[2]):
  tpnum = ((ind+1.0)/(ind+2.0))*math.pi*tnum/100000.0
  appendrow.append(tpnum)

tarr.append(appendrow)

tarr_ppf_string = pprint.pformat(tarr)

print("printout 1:\n{}\n".format(tarr_ppf_string))

tarr_ppf_string2 = pprint.pformat(round_floats(tarr))

print("printout 2:\n{}\n".format(tarr_ppf_string2))

tarr_json_string = json.dumps(tarr)

print("printout 3:\n{}\n".format(tarr_json_string))

tarr_json_string2 = json.dumps(round_floats(tarr))

print("printout 4:\n{}\n".format(tarr_json_string2))

tarr_fp_string = fpformat(tarr, formatfloat)

print("printout 5:\n{}\n".format(tarr_fp_string))

The output of this script in Python 3 is this:

printout 1:
[['aa', 'bb', 'cc', 'dd', 'ee'],
 [{'v': 1.1}, {'w': 2.2}, {'x': 3.3}, {'y': 4.4}, {'z': 5.5555}],
 [10, 20, 30, 40, 50],
 [11.1, 22.22, 33.333, 44.4444, 55.55555],
 [15.707963267948966,
  41.8879020478639,
  70.68583470577035,
  100.53096491487338,
  130.89969389957471],
 [0.00015707963267948965,
  0.00041887902047863906,
  0.0007068583470577034,
  0.0010053096491487337,
  0.0013089969389957472]]

printout 2:
[['aa', 'bb', 'cc', 'dd', 'ee'],
 [{'v': '1.10'}, {'w': '2.20'}, {'x': '3.30'}, {'y': '4.40'}, {'z': '5.56'}],
 [10, 20, 30, 40, 50],
 ['11.10', '22.22', '33.33', '44.44', '55.56'],
 ['15.71', '41.89', '70.69', '100.53', '130.90'],
 ['1.57e-04', '4.19e-04', '7.07e-04', '1.01e-03', '1.31e-03']]

printout 3:
[["aa", "bb", "cc", "dd", "ee"], [{"v": 1.1}, {"w": 2.2}, {"x": 3.3}, {"y": 4.4}, {"z": 5.5555}], [10, 20, 30, 40, 50], [11.1, 22.22, 33.333, 44.4444, 55.55555], [15.707963267948966, 41.8879020478639, 70.68583470577035, 100.53096491487338, 130.89969389957471], [0.00015707963267948965, 0.00041887902047863906, 0.0007068583470577034, 0.0010053096491487337, 0.0013089969389957472]]

printout 4:
[["aa", "bb", "cc", "dd", "ee"], [{"v": "1.10"}, {"w": "2.20"}, {"x": "3.30"}, {"y": "4.40"}, {"z": "5.56"}], [10, 20, 30, 40, 50], ["11.10", "22.22", "33.33", "44.44", "55.56"], ["15.71", "41.89", "70.69", "100.53", "130.90"], ["1.57e-04", "4.19e-04", "7.07e-04", "1.01e-03", "1.31e-03"]]

printout 5:
[['<generator object fpformat.<locals>.<genexpr> at 0x6ffffcc57d0>', '<generator object fpformat.<locals>.<genexpr> at 0x6ffffcc57d0>', '<generator object fpformat.<locals>.<genexpr> at 0x6ffffcc57d0>', '<generator object fpformat.<locals>.<genexpr> at 0x6ffffcc57d0>', '<generator object fpformat.<locals>.<genexpr> at 0x6ffffcc57d0>'], [{'v': '1.1'}, {'w': '2.2'}, {'x': '3.3'}, {'y': '4.4'}, {'z': '5.56'}], ['10', '20', '30', '40', '50'], ['11.1', '22.2', '33.3', '44.4', '55.6'], ['15.7', '41.9', '70.7', '101', '131'], ['0.000157', '0.000419', '0.000707', '0.00101', '0.00131']]

Essentially, what I'd want is "printout 2" - except, with the numbers remaining numbers, and not printed as strings; that is, I'd want that printout to be this:

[['aa', 'bb', 'cc', 'dd', 'ee'],
 [{'v': 1.1'}, {'w': 2.20}, {'x': 3.30}, {'y': 4.40}, {'z': 5.56}],
 [10, 20, 30, 40, 50],
 [11.10, 22.22, 33.33, 44.44, 55.56],
 [15.71, 41.89, 70.69, 100.53, 130.90],
 [1.57e-04, 4.19e-04, 7.07e-04, 1.01e-03, 1.31e-03]]

How can I achieve this kind of printout in Python? (needing this for Python 3, but a solution for Python 2 would be great, too)

sdbbs
  • 4,270
  • 5
  • 32
  • 87

1 Answers1

2

OLD ANSWER

The problem is that you are inserting the floats as strings and not as floats. You are printing dictionaries containing strings, and so they are printed as strings. You want to insert the numbers as floats.

You can round floats to a certain amount of decimal places without converting them to strings.

def round_floats(o):
  if isinstance(o, float): return round(o, 2) #Line 13, using round instead of
                                                  #string formatting
  if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
  if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]
  return o

Replacing the use of string formatting with the round(float, decimals) function gives the following output for printout2:

printout 2:
[['aa', 'bb', 'cc', 'dd', 'ee'],
 [{'v': 1.1}, {'w': 2.2}, {'x': 3.3}, {'y': 4.4}, {'z': 5.56}],
 [10, 20, 30, 40, 50],
 [11.1, 22.22, 33.33, 44.44, 55.56],
 [15.71, 41.89, 70.69, 100.53, 130.9],
 [0.0, 0.0, 0.0, 0.0, 0.0]]


NEW ANSWER

EDIT - after much debugging we stumble upon a bit of a problem. It is impossible to force the pretty-print to use a certain exponential formatting all the time.

I tried using this bit of code to override the pretty-printer's float operator but it doesn't work for lists. This solution does not override the formatter for a type if it is nested in a list/dictionary/struct. Unfortunately, without re-writing half the pretty-printer code, this solution does not seem viable.

The good news is there might not be a need. You can just use two decimal places of precision with all your floats. This does not guarantee that the number will be represented with scientific notation, but in most cases this will suit you.

def round_floats(o):
  if isinstance(o, float): return float("{:.2f}".format(o) if abs(o)>0.01 else "{:.2e}".format(o))
  #Edited line 13, just casting back to float
  if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
  if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]

It might be better to instead use the decimal class to adjust the precision of the numbers.

import decimal
decimal.getcontext().prec = 3

def round_floats(o):
  if isinstance(o, float): return float(+decimal.Decimal(o))
  if isinstance(o, dict): return {k: round_floats(v) for k, v in o.items()}
  if isinstance(o, (list, tuple)): return [round_floats(x) for x in o]

In either case, the bad news is that numbers around 0 do not behave as you want. A number like 0.0001 will stay with the same representation (as opposed to 1.0e-4). However it does perform a calculation and checks which notation (scientific or normal) takes up less space, so given this approach each representation is guaranteed to be the shortest possible.

Output:

[['aa', 'bb', 'cc', 'dd', 'ee'],
 [{'v': 1.1}, {'w': 2.2}, {'x': 3.3}, {'y': 4.4}, {'z': 5.56}],
 [10, 20, 30, 40, 50],
 [11.1, 22.2, 33.3, 44.4, 55.6],
 [15.7, 41.9, 70.7, 101.0, 131.0],
 [0.000157, 0.000419, 0.000707, 0.00101, 0.00131]]
 #Note that the bottom row is badly represented, but this representation is
 #not longer than writing out the same number in scientific notation. If
 #These numbers were smaller, they would be represented scientifically.

anerisgreat
  • 342
  • 1
  • 7
  • Thanks @anerisgreat - that looks almost great, except now the last row is all zeroes, which it shouldn't be (which is why I mentioned the scientific notation, if value is smaller than 0.01). Any way to do that as well? – sdbbs Oct 16 '19 at 08:06
  • 1
    Sure thing! Updated answer. – anerisgreat Oct 16 '19 at 10:32
  • Many thanks @anerisgreat - the updated answer goes into much detail about what the problem is, much appreciated! – sdbbs Oct 16 '19 at 12:10