3

At the moment I'm working on a import script that imports bank account data into a MySQL database using php. I found some mt940 pharser but those didn't do what I wanted to do or didn't meet the current MT940 standard. So I wrote my own simple class that parses me the data that I need.

The problem no is, and this might even not be a MT940 issue, is that I must filter double transaction. And in basic, this is very simple, if exact the same transaction already exists in the database, don't import again. So that's what I have done.

But now the fun part: Some transactions might happen twice at the same day. For example someone does the same transaction twice at the same day [someone might like me and just send me the money twice ;)]. While importing this for the first time, there is no problem. In one file each transaction is a transaction.

But now the problem: Because the transaction are not unique (MT940 doesn't send any unique's for a transaction), it is hard to filter out a double transaction from a unique transaction. So if I have downloaded two MT940 files from my bank account. And one of the two transactions is in the first file, and another one is in the second file. When importing the second file, it will now tel me that the transaction is a double transaction.

So.. I'm struggling with this, and probably I'm not the only one. Also big accountancy program's use MT940 structures and must deal with the same issue.

Who knows a way to handle with this?

Additional information

In the transaction itself (:61:), the following information is available:

  • Date
  • Amount
  • Bank account number
  • Bank account name
  • Bank account address
  • Transaction description

It might happen that if someone does a transaction twice, all this information is the same.

Example of two double transaction (of course I modified the name and bankaccount of the transaction): (this one was double in one MT940 file, but are really two transactions).

:61:130311C000000000029,95N122NONREF          
0123456789
:86:/ORDP//NAME/JANSEN W H/ADDR/SOMEROAD 1 9569 GS THECITY/REMI/N
OKIA 3310/ISDT/2013-03-11
Timo002
  • 3,138
  • 4
  • 40
  • 65
  • If every single field is the same, I think that no algorithm can decide if it's a double or merely a new, identical transaction. – STT LCU Jul 22 '13 at 09:22
  • PS: Is the transaction date a timestamp, or just DD/MM/YYYY? – STT LCU Jul 22 '13 at 09:24
  • No timestamp, just YYMMDD. – Timo002 Jul 22 '13 at 09:26
  • Are you absolutely, positively sure that the genuine identical transactions are equivalent, bit by bit? – STT LCU Jul 22 '13 at 09:36
  • @STTLCU: Yes, identical bit by bit, see my addition in the post! – Timo002 Jul 22 '13 at 10:00
  • then how can you ever distinguish them if they're duplicate or repeated? this problem is unsolvable. – STT LCU Jul 22 '13 at 10:03
  • I think so to, but i can't imagine that I'm the only one struggling with this issue. Large banking software use the same import export methods. I also discovered that there are other software program's that struggle with this. – Timo002 Jul 22 '13 at 10:06
  • based on this https://www.rabotransact.com/rtpubeu/ContentIntegrationPortlet/images/Format%20Description%20MT940%202%200%20EN_tcm79-179785.pdf reference, shouldn't "Record :28: Statement Number/Sequence Number" be different on repeated transactions? – STT LCU Jul 22 '13 at 10:10
  • No, (I'm also doing this for a Rabobank export), the transactions are after the :61:. :28: is a statement number, in my case :28C: and is allways zero. Also one statement number holds around 200 transactions (:61:) – Timo002 Jul 22 '13 at 10:21
  • what about record :20: which is mandatory? – STT LCU Jul 22 '13 at 10:29
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/33896/discussion-between-timo002-and-stt-lcu) – Timo002 Jul 22 '13 at 10:31

3 Answers3

1

Let's say that you have 3 transactions: A, B and C.

A and B are duplicates, therefore B must be discarded

C is a different transaction, but it's identical to A in every aspect.

By the data you provided in the question, it's true that A = B = C. They're absolutely identical. There are no ways to determine, by reading their content, who's genuine and who's duplicate. Everyone of the following scenarios may be valid:

  1. C is a duplicate of B, which is a duplicate of A. Discard B and C and keep A.
  2. B is a genuine transaction identical to A. C is a duplicate of B. Discard B.
  3. They're three genuine identical transactions. Keep all of them.
  4. [etc...]

As you can see, there's no algorithm that can decide 100% which case is the correct one. And since we're talking about money, everything less than 100% certainty can't be accepted.

What to do then?

If a computer can't determine it, let the humans kick in. When you record a conflict, compile a list. Process everything that does not have conflicts and when you've finished the batch, send the conflict list to an human operator, and let him/her sort out the duplicates from the genuine transactions (even if I still wonder HOW can they sort them if they're identical in every aspect)

If you find out that the operators know HOW to sort them, try to deduce their human algorithm, if possible.

STT LCU
  • 4,348
  • 4
  • 29
  • 47
  • This is also what I was thinking of. Only problem now is, that if the import overlaps one day, and that should be there to be sure to not miss any transactions between two following days. It means that they manually must check around 200 transactions for every import. And probably 199 are valid doubles... But working this out, I think that's just the way it is! – Timo002 Jul 22 '13 at 10:04
1

The MT940 statement message has opening and closing balances, right? So if you have a statement containing two identical transactions, the bank account balance is going to reflect that.

If you 'clean up' data that appears duplicate, it doesn't mean that the transactions didn't happen. If you are doing this clean-up in order to fit your data from the 940 into a database that has a unique constraint violation that disallows identical transactions to be posted, that could be a problem later - or immediately if anyone compares the sum of the transactions in your database against the closing balance from the statement.

A better approach might be to generate some additional key information (perhaps increment a sequence number for each statement-line) that you can add to differentiate otherwise apparently identical transactions and load them into your db without violating the constraints.

Thomas Kimber
  • 10,601
  • 3
  • 25
  • 42
0

One answer pointed out that mt940 messages contain a starting (and ending) balance. Use the starting balance information to create a current balance for each transaction as you go through the message. Use the current balance in conjunction with the value date of the transaction, and perhaps the account code to create a unique identifier. You can do this by for instance concatenating them and using whatever function your programming language makes available to create a MD5 hash.

  • This does not provide an answer to the question. Once you have sufficient [reputation](http://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](http://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](http://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/low-quality-posts/12350649) – Bono May 14 '16 at 09:43
  • OK, let me be a bit clearer. The use case is of importing MT940 files into a database. You do this when you want to make bank statement data available via an MT940 import to your e.g. accounting application, for reconciliation. You may do this for different and possibly overlapping time periods (monthly, quarterly), potentially reading the same data more than once. As the OP wrote, there is no unique identifier in a MT940 at the transaction level. My suggestion shows how to create such an identifier, which then lets you identify duplicates, i.e transactions you already have in your database. – ectoplasm May 15 '16 at 11:21