FIX Protocol: Receiving out of Sequence Message during retransmission causes loop in retransmission

Question

I have a fix client using QuickFIX/n as the FIX layer.

If for some technical reason my client gets disconnected, the FIX server will continue sending messages until it notices the client is no more present (with heartbeat I assume).

When my client reconnects it will notice the gap on first message. For instance if my client last received message has SeqNuM=124 and upon reconnection the server sends SeqNum=152, it means the server sent messages from 125 to 151 before being aware of the disconnect.

My issue happens afterwards. My client sends a Resend request 34=2 with BeginSeqNo 7=125 and EndSeqNo=0 (give me everything). During this retransmission and before it finishes, the FIX server sends me new message with SeqNo=153

So what my clients get is:

- Disconnects with last message 124
- Reconnects 
- Receive 151
- Ask for Resend from 125 to 0 (everything after 125)
- Receive 125
- Receive 126
- Receive 127
- Receive 152 (35=8) <-- this makes the retransmission abort on my side
- Ask For resend from 128 to 0
---> if the number of message to resend is too high and new messages keep coming in
     my client never manages to get the full retransmission in one go.

When talking with the other party (responsible for the server), they say it's OK to continue sending new messages during retransmission and that I should cache them until retransmission is finished.

It seems like it's not the way QuickFIX/n implemented this (I found no option to handle this specific case) but when looking at FIX documentation I can't find any info about this cache procedure. I assume also that this cache procedure is quite complex as I should probably cache for a given time (otherwise I may wait forever for missing messages).

My question is simple: What is this cache procedure and where can I find specs about it? And, is this handled by QuickFIX libraries or should I implement something specific on top of it?

I am not that familiar with QuickFIX/n, but did you make sure to use the most recent version? — Christoph John, Mar 09 '20 at 10:23
Unfortunatly QuickFix/N releases are not that great. I'm using 1.8.0 version (from 2018-01-31). There is a new release 1.9.0 (2019-10-31) but this release ha not been released to Nuget. Even release 1.8.0 on nuget is unofficial. I did check all issues (open and closed) on the github and did not find anything related to this anyway. — Bruno Belmondo, Mar 09 '20 at 10:31
First I need confirmation that people doing FIX are doing what my third party is telling me to do. — Bruno Belmondo, Mar 09 '20 at 10:31
What is message 152 in your example? Is it a SequenceReset message or a normal app message? — Christoph John, Mar 09 '20 at 10:53
Basically (and simplified) the FIX spec says that a FIX engine should send a ResendRequest as long as the last received message sequence number is higher than the expected sequence number. Quickly browsing the spec I also did not find a "cache" mentioned. But given the fact that it is quite inefficient to re-request messages that were already received the FIX engines use a cache to buffer the messages until the gap has been filled completely. — Christoph John, Mar 09 '20 at 11:37
Could you point me to this behavior in Quifix/j sources? I assume I will be able to understand the limits implied by this mecanism by reading the code. — Bruno Belmondo, Mar 09 '20 at 11:44
This is done in the nextQueued() methods that QFJ as well as QuickFIX/N has. Basically it is checked on every received message if there are still queued messages until the gap has been filled. E.g. here https://github.com/connamara/quickfixn/blob/master/QuickFIXn/Session.cs#L1614 — Christoph John, Mar 09 '20 at 11:47
QuickFIX/n has never officially released a NuGet package. The ones you found were made by someone else. — Grant Birchmeier, Mar 10 '20 at 18:43
We may have fixed this issue in 1.9. Please download the latest dlls and try them. — Grant Birchmeier, Mar 11 '20 at 13:44
Thanks for your help. It seems that the easiest way to solve the issue is by using SEND_REDUNDANT_RESENDREQUESTS to false. This option exist in both QuickFix/J and N and was suggested by my other party. @Grant We see that you are very active those days. Thank you so much for your work on QuickFix/n this is very helpfull. — Bruno Belmondo, Mar 12 '20 at 22:29
Glad you solved it. You should post that as an answer to this question. (And yes, working hard to clear our PR and Issue backlog. Working toward a 1.10!) — Grant Birchmeier, Mar 13 '20 at 14:27

score 1 · Accepted Answer · edited Mar 16 '20 at 07:07

When digging a bit more we finally found out that the real issue was my client asking again and again for the same retransmission.

For instance, if I'm 4000 sequence numbers away an I resend a retransmission message each time there is a sequence discrepency (let's say every 10 messages) I may end up asking 500 times for more than 1000+ messages in average.

This generates a high tension on server side and only makes things worse.

There is an option in QuickFIX/J which is also available in QuickFIX/N (but undocumented on this one): SendRedundantResendRequests. By setting it to false you make sure your client does not ask twice for the same retransmission. This greatly lowers the pressure on the server and eases the reconnection.

This setting is `false` by default (at least in QFJ). I certainly hope it is also `false` by default in QuickFIX/N. — Christoph John, Mar 15 '20 at 21:38

FIX Protocol: Receiving out of Sequence Message during retransmission causes loop in retransmission

1 Answers1