8

I would like to notice that the scenarion I will describe happen rare enough and in most cases everything works as expected.

I have 1 topic and 1 subscription on Pub/Sub side.

My java application listens for subscription, does some processing and sends acknowledge back. Because of fact that google Pub/Sub guarantees at least once delivery, we do message deduplication on our side based on objectGeneration header and 'objectId' header.

Sometimes we see that message that was acknowldged is accepted by our application again and again and it is unexpected behaviour.

Log example:

//first
2019-12-17 20:51:57.375 INFO 1 --- [sub-subscriber3] bucketNotificationFlow : Received new message from pub-sub: GenericMessage [payload={....}, headers={.....objectGeneration=1576615916875106, eventTime=2019-12-17T20:51:56.874940Z, objectId=Small_files_bunch/100_12_1.csv, ....
....
2019-12-17 20:51:57.698 INFO 1 --- [sub-subscriber3] .i.g.PubSubMessageAcknowledgementHandler : Acknowledged message - 1576615916875106
...
//duplicate 1
2019-12-17 20:51:59.663 INFO 1 --- [sub-subscriber4] bucketNotificationFlow : Received new message from pub-sub: GenericMessage [payload={...}, headers={ objectGeneration=1576615916875106, eventTime=2019-12-17T20:51:56.874940Z, objectId=Small_files_bunch/100_12_1.csv", ....
...
2019-12-17 20:51:59.704 INFO 1 --- [sub-subscriber4] c.b.m.i.DiscardedMessagesHandler : Duplicate message received GenericMessage [ headers={idempotent.keys=[objectGeneration.1576615916875106, objectId.Small_files_bunch/100_12_1.csv], ...
....
//duplicate 2
2019-12-17 22:52:02.239 INFO 1 --- [sub-subscriber1] bucketNotificationFlow : Received new message from pub-sub: GenericMessage [payload={...}, headers={objectGeneration=1576615916875106, eventTime=2019-12-17T20:51:56.874940Z, objectId=Small_files_bunch/100_12_1.csv, ...
...
2019-12-17 22:52:02.339 INFO 1 --- [sub-subscriber1] c.b.m.i.DiscardedMessagesHandler : Duplicate message received GenericMessage [ headers={idempotent.keys=[objectGeneration.1576615916875106, objectId.Small_files_bunch/100_12_1.csv], ...

// and so on each 2 hours

Code for acknowledgement:

var generation = message.getHeaders().get("objectGeneration");
pubSubMessage = message.getHeaders().get(GcpPubSubHeaders.ORIGINAL_MESSAGE, BasicAcknowledgeablePubsubMessage.class)
pubSubMessage.ack().addCallback(
        v -> {
            removeFromIdempotentStore(targetMessage, false);
            log.info("Acknowledged message - {}", generation); //from logs we see that this line was invoked
        },
        e -> {
            removeFromIdempotentStore(targetMessage, false);
            log.error("Failed to acknowledge message - {}", generation, e);
        }
);

GCP subscription page contains following diagram:

enter image description here

StackDriver acknowledge diagram: enter image description here

Any ideas what is going on, how to troubleshoot it and fix it ?

gstackoverflow
  • 36,709
  • 117
  • 359
  • 710

2 Answers2

1

Try checking Stackdriver to see if you are missing acknowledgement deadlines.

The two hour wait time between duplicates is very interesting. Have you tried expanding your message deadline before? (Info on this is at the above link.)

  • Why 2 hours ? because I set **max-ack-extension-period: 7200** – gstackoverflow Dec 24 '19 at 09:23
  • Acknowledgement deadline is 600 Seconds – gstackoverflow Dec 24 '19 at 09:24
  • I found a thread on the [GitHub Google Cloud Client Libraries for Go](https://github.com/GoogleCloudPlatform/google-cloud-go/issues/778#issuecomment-363571402) describing your issue. I wasn't able to find the same on the Java client library. Try reducing your message deadline to less than 600. – Maximus Macdonald Dec 25 '19 at 01:20
  • Initially message deadline was 10 sec(default value) and it was attempt to fix the issue when I set 600 sec, so message deadline doesn't matter for that issue – gstackoverflow Dec 25 '19 at 07:25
  • Did you check your Stackdriver logs to see if you are missing acknowledgement deadlines? – Maximus Macdonald Dec 27 '19 at 04:41
  • http://dl3.joxi.net/drive/2019/12/27/0005/3037/338909/09/bda03c0e46.jpg – gstackoverflow Dec 27 '19 at 14:43
  • looks like everuthing is blue(success) on the diagram(added diagram to the topic too) – gstackoverflow Dec 27 '19 at 14:45
  • More detailed picture where you can see that I use correct filters: http://dl3.joxi.net/drive/2019/12/27/0005/3037/338909/09/7eae241c39.jpg – gstackoverflow Dec 27 '19 at 14:49
  • Do you have any ideas? – gstackoverflow Dec 30 '19 at 21:32
  • I spent most of my day yesterday trying to reproduce this issue and I was unable to get the intended results. Here is my main idea as of now: Judging by your "oldest unacked message age" metric, one of your messages is not being acknowledged for 2 hours. This we already know. Because your stackdriver logs only show successes, this leads me to believe that the original acknowledgement is not being sent back to pubsub. – Maximus Macdonald Dec 30 '19 at 22:39
  • And I am aware that your logs show that the message was acknowledged. And I know this is extremely unlikely, but I have been unable find any other cases similar to this where this solution was not the answer. You mentioned that this behavior only happens sometimes. Have you noticed any patterns? Like when a specific function is called? – Maximus Macdonald Dec 30 '19 at 22:43
  • Let me share with you a bit more context. I have bucket which generates notification and sends it to the topic when I put file into it. Application listens corresponding subscription, processes file and the sends acknowledge to the Pub/Sub. This issue reproduces when I send thousands of files of different sizes(most of files have size 3 kb but 12 files has size 190MB). My application is powerful enough to make sure that time between notification generation and acknowledge much less than 2 hours(even for huge files) – gstackoverflow Dec 31 '19 at 09:29
  • Okay, it sounds like we both agree on the issue not being with job processing and timeouts. And now I know that this only occurs when a large number (thousands) of files are sent to your bucket. With this in mind, the first duplicate message you receive in your logs is very close to when the first message is received. Can you try acknowledging the first message when it is received, and then not processing the message? – Maximus Macdonald Jan 01 '20 at 21:45
  • You mean to acknowledge first *duplicated* message without processing? I've already implemented this workaround and it works – gstackoverflow Jan 01 '20 at 22:23
  • Okay so you have already implemented this. It seems the best course of action here would be to [create an issue](https://cloud.google.com/support/docs/issue-trackers#trackers-list) with Google Cloud Platform. – Maximus Macdonald Jan 06 '20 at 21:36
0

See more info here: How to cleanup the JdbcMetadataStore?

According our conclusion, it would be better do not remove entries from metadata store table immediately after processing. Some external job should do the trick from time to time and only for those entries which are old enough for remove and we are definitely sure that Pub/Sub won't redeliver to us the same message anymore.

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • yes, it is my current state of things. But I thunk that it is a workaround. I suppose I configured google pub/sub or spring-cloud-gcp in a wrong way. Highly unlikely but it might be a bug in the library or pub/sub. Redelivery is ok and it is expected anomaly. But in our case successful callback is invoked but message was not removed from subscription and we get it again and again. So we can't just ignore duplicated we have to do repeated acknowledge to remove message from subscription – gstackoverflow Dec 26 '19 at 17:30
  • **again and again** means endless – gstackoverflow Dec 26 '19 at 17:42