Is there any way to use MapReduce to replay a sequential financial transaction log?

Question

I'm working on a project that requires GAAP accounting conformance for deferred revenue collection. Cash is converted to a proprietary "currency" or credits. These credits may have different values based upon the purchase price at time of conversion (including exchange rate fluctuations).

As each purchase is made, the amount is deducted from the earliest "bucket" of credits until they are exhausted and the next eldest bucket is encountered. The valuation of the credits spent define the total revenue for that particular purchase... so it may differ for each product simply based upon the order of credits and debits / the value of each credit.

My sequential mind sees this as an easy imperative problem. With the data I'm capturing, I can replay the transaction logs and arrive at a daily total for revenue for each customer and each product... that can then be amortized for the services provided. No problem.

My data is in a sharded MongoDB cluster of replica sets. Dumping it and writing a post-processing script is easy, but I'd love to be able to report directly out of it.

I've played with simple MapReduce operations in the past. Is there any way to sequentially process a transaction log? While normal transaction logs can be processed in parallel, and I've done this, I just can't seem to find a way to twist MapReduce (MongoDB or Hadoop) to replay in order. I don't think it can be done. Am I wrong or just a n00b?

My only options at this point are post-processing replay, manipulating currency buckets at time of purchase or utilizing something I've overlooked... perhaps I'm not thinking enough outside the box here.

Any brief insights or pointers are greatly appreciated.

Wow, one application I wouldn't use mongoDB is for anything cash related. Are you sure this is the way to go? — ppeterka, Mar 05 '13 at 11:07
the cash conversion is handled and tracked externally. the credit transactions are initiated and tracked externally as well. both, isolated rdbms clusters. mongo is the only centralized datastore with all of the information in one place. it's not the end of the world if it takes us 24hrs to reconcile a discrepancy. — pestilence669, Mar 06 '13 at 05:47
Ok, I see. This makes perfect sense, and seems good! I'm sorry I assumed that this was one of the 'use mongodb for what it's not to be used for' situations... — ppeterka, Mar 06 '13 at 06:53
but... MongoDB is web scale! :) - http://www.youtube.com/watch?v=URJeuxI7kHo — pestilence669, Mar 06 '13 at 07:26
Transaction logs are like time series which can be processed by mapReduce like events (see [hierarchical-aggregation](http://docs.mongodb.org/manual/use-cases/hierarchical-aggregation/)). The input to mapReduce can be pre-selected and sorted (replay order). There is also the possibility of [incremental mapReduce](http://docs.mongodb.org/manual/applications/map-reduce/#incremental-map-reduce). If you'd like a more in-depth answer, please, post the specifics about your collections and the expected results. — ronasta, Mar 26 '13 at 09:55

Is there any way to use MapReduce to replay a sequential financial transaction log?

0 Answers0