0

My issue is that I have a pre-process that reads data in from a csv and reconciles on 2 client-given fields (a document count and a check total), and then parses the data and calculates the totals for itself before comparing the two for reconciliation.

First, here are my imports:

from csv import reader, writer, QUOTE_MINIMAL
import logging
from os import getcwd, mkdir, path
from sys import argv
from datetime import date
from types import IntType, FloatType 

Next, here is the actual reconciliation step itself:

def _recon_totals(self):
        """
        Reconcile the check total amount and document count and write out the file name,
        check numbers, vendor names, and timestamp to weekly report.
        """

        # Client totals
        client_doc_count = int(self.header_data[0][6])
        client_check_tot = float(self.header_data[0][7])
        # Double check variable typing for reconciliation totals.
        logging.info('Document count is: {0}'.format(client_doc_count))
        # doc_var_type = type(client_doc_count)
        # assert doc_var_type is IntType, 'Doc count is not an integer: {0}'.format(
        #    doc_var_type) 
        logging.info('Check Total is: {0}'.format(client_check_tot))
        # check_var_type = type(client_check_tot)
        # assert check_var_type is FloatType, 'Check tot is not a float: {0}'.format(
        #    check_var_type)

        # RRD totals
        rrd_doc_count = 0
        rrd_check_tot = 0.0

        with open(self.rpt_of, 'a') as rpt_outfile:
            for transact in self.transact_data:
                row_type = transact[0]
                logging.debug('Transaction type is: {0}'.format(row_type))

                if row_type == 'P':
                    # Reconciliation
                    rrd_doc_count += 1
                    trans_chk_amt = float(transact[12])
                    # trans_chk_type = type(trans_chk_amt)
                    # assert trans_chk_type is FloatType, 'Transaction Check Total is '\
                    #                                     'not a float: {0}'.format(
                    #                                         trans_chk_type)
                    rrd_check_tot += trans_chk_amt
                    # Reporting
                    vend_name = transact[2]
                    file_name = self.infile.split('/')[-1]
                    print('File name', file_name)
                    check_num = transact[9]
                    cur_time = date.today()
                    rpt_outfile.write('{0:<50}{1:<50}{2:<30}{3}\n'.format(file_name,
                                                                          vend_name,
                                                                          check_num,
                                                                          cur_time))
        # Reconcile totals and return the lists for writing if they are correct
        # if (client_doc_count, client_check_tot) == (rrd_doc_count, rrd_check_tot):
        #     logging.info('Recon totals match!')
        if client_doc_count == rrd_doc_count and client_check_tot == rrd_check_tot:
        #     logging.info('Recon totals match!')
            return True

        else:
            raise ValueError('Recon totals do not match! Client: {0} {1} {2} {3}\n'
                             'RRD {4} {5} {6} {7}'.format(client_doc_count,
                                                          client_check_tot,
                                                          type(client_doc_count),
                                                          type(client_check_tot),
                                                          rrd_doc_count,
                                                          rrd_check_tot,
                                                          type(rrd_doc_count),
                                                          type(rrd_check_tot)))

I have 6 files I am running, 4 of which ran through fine (passed reconciliation), and then 2 that failed. This is normal, clients give us bad data, except for the fact that I cannot find anything in the data indicating this is an error. Even my stack call shows that these the client totals and my totals should be reconciling:

ValueError: Recon totals do not match! Client: 2 8739.54 <type 'int'> <type 'float'>
RRD 2 8739.54 <type 'int'> <type 'float'>

I've tried two different ways of writing the statement that checks the two, and I get equal results (to be expected).

Finally, here is an (modified, except for pertinent fields) example of the data fields in question (this is the header record with their counts):

"H","XXX","XXX","XXX","XXX","XXX","2","8739.54","","","","","","","","","","","","","","","",""

Then here are the lines I reconcile on:

"P","XXX","XXX","XXX","","XXX","XXX","XXX","XXX","XXX","XXX","XXX","846.80",...(more fields that aren't pertinent)
"P","XXX","XXX","XXX","","XXX","XXX","XXX","XXX","XXX","XXX","XXX","7892.74",...(more fields that aren't pertinent)

For every "P" record I increment my doc count, and then I add the non "XXX" field into a running total.

In sum, any help on this would be greatly appreciated, I can see no logical error that I have made.

flybonzai
  • 3,763
  • 11
  • 38
  • 72
  • 2
    You should try using [decimal](https://docs.python.org/2/library/decimal.html) for dealing with money. Floating point has inherent problems with base-10 numbers and the decimal types will remove these problems. – Los Frijoles Jan 14 '16 at 19:07

2 Answers2

2

I disagree with the answer suggesting a margin for error. This is unreliable (since the margin would change with the number of floats that you are summing) and really doesn't seem like a good solution. It reminds me of the movie Office Space where they simply chop off fractions of pennies during transactions and divert them into another bank account (your margin for error).

However I would definitely agree with the suggestion doing the check to make sure this really is a floating point error by using subtraction.

I would abandon float altogether and use the decimal library. All you would need to do is just replace all of your float constructors with Decimal constructors:

from decimal import Decimal


def _recon_totals(self):
    """
    Reconcile the check total amount and document count and write out the file name,
    check numbers, vendor names, and timestamp to weekly report.
    """

    # Client totals
    client_doc_count = int(self.header_data[0][6])
    client_check_tot = Decimal(self.header_data[0][7])
    # Double check variable typing for reconciliation totals.
    logging.info('Document count is: {0}'.format(client_doc_count))
    # doc_var_type = type(client_doc_count)
    # assert doc_var_type is IntType, 'Doc count is not an integer: {0}'.format(
    #    doc_var_type) 
    logging.info('Check Total is: {0}'.format(client_check_tot))

    # RRD totals
    rrd_doc_count = 0
    rrd_check_tot = Decimal(0.0)

    with open(self.rpt_of, 'a') as rpt_outfile:
        for transact in self.transact_data:
            row_type = transact[0]
            logging.debug('Transaction type is: {0}'.format(row_type))

            if row_type == 'P':
                # Reconciliation
                rrd_doc_count += 1
                trans_chk_amt = Decimal(transact[12])                           trans_chk_type)
                rrd_check_tot += trans_chk_amt
                # Reporting
                vend_name = transact[2]
                file_name = self.infile.split('/')[-1]
                print('File name', file_name)
                check_num = transact[9]
                cur_time = date.today()
                rpt_outfile.write('{0:<50}{1:<50}{2:<30}{3}\n'.format(file_name,
                                                                      vend_name,
                                                                      check_num,
                                                                      cur_time))
    # Reconcile totals and return the lists for writing if they are correct
    # if (client_doc_count, client_check_tot) == (rrd_doc_count, rrd_check_tot):
    #     logging.info('Recon totals match!')
    if client_doc_count == rrd_doc_count and client_check_tot == rrd_check_tot:
    #     logging.info('Recon totals match!')
        return True

    else:
        raise ValueError('Recon totals do not match! Client: {0} {1} {2} {3}\n'
                         'RRD {4} {5} {6} {7}'.format(client_doc_count,
                                                      client_check_tot,
                                                      type(client_doc_count),
                                                      type(client_check_tot),
                                                      rrd_doc_count,
                                                      rrd_check_tot,
                                                      type(rrd_doc_count),
                                                      type(rrd_check_tot)))

Decimals work by storing the number as base-10 rather than base-2 like floats do. Here are some examples of floating point inaccuracies. Now, since all of our money is generally transacted using base-10, it would only make sense to use a base-10 notation for manipulating it rather than doing a lossy conversion to base-2 and then back to base-10.

Community
  • 1
  • 1
Los Frijoles
  • 4,771
  • 5
  • 30
  • 49
0

I would not rely on floating point equality checks for real-world data, because floating point math is imprecise in all sorts of weird ways. I suggest first making sure that this discrepancy is caused by floating point imprecision by printing the difference between the two values you are comparing and making sure that it is very, very small compared to the numbers you're working with. Then I suggest defining a margin of error within which two totals are considered effectively equal; for real-world money, half a cent seems like a natural value for this tolerance.

A_K
  • 457
  • 3
  • 11