I think you point on one weakness of the ActiveMQ itself: it cannot guarantee the consumers are really strict when consuming the messages.
We have similar problems with our ActiveMQ (5.10.7) because it seems the KahaDB make likes a "disk fragmentation" and we noticed this could be from at least two issues with consumers:
Case 1: Slow consumer
We have in our system a consumer which cannot consume many messages at once. if only one unconsumed message stays in a KahaDB page, it will keep all the whole page (with all others messages which are already consumed, and acknowledged).
For preventing the KahaDB Storage to reach 100% (which will slows the producers) we transfer the messages in another ActiveMQ instance temporary queue like this:
from("activemqPROD:queue:BIG_QUEUE_UNCONSUMED")
.to("activemqTEMP:queue:TEMP_BIG_QUEUE");
then pushing them back:
from("activemqTEMP:queue:TEMP_BIG_QUEUE")
.to("activemqPROD:queue:BIG_QUEUE_UNCONSUMED");
The alternative is to store them on file system then reload them, but you loose the JMS (and custom) headers. With the temporary queue solution you keep all headers.
Case 2: Consumer who never gives acknowledgement
Sometimes even we make the previous operation, even all unconsumed queues are empty, the storage stays higher than 0%.
By looking into the KahaDB file we can see there are still pages present even no more messages in all QUEUES.
For the TOPICS, we stopped using durable subscriptions, then the storage should also stays at 0%.
The potential cause (this is a supposition, but with a strong confidence) is that some of the consumed messages were never acknowledged
properly.
The reason we think this is the cause, it is because in the logs, we can still see messages
"not removing data file: 12345 as contained ack(s) refer to referenced file: [12344, 12345]"
This can happens for example when the consumer is disconnecting abruptly (they consumed some messages but disconnect before sending the ack
)
In our case the messages never expires, then this could also be a potential issue for this case. However it is not clear if setting an expiration can destroy "non-acked" messages.
Because we do not want to loose any event, there is no expiration time for these specific queues.
According to your question, it looks you are in the second case, then our solution is:
- Be sure no more producer / consumer are connecting to the ActiveMQ
- Be sure all queues and durable topics are empty
- Delete all files in the KahaDB storage (from file system)
- Restart ActiveMQ (fresh)
Unfortunately we did not find a better way to manage with these cases, if someone else have a better alternative we would be happy to know it.
This article can also give you some solution (like setting an expiry policy for the ActiveMQ.DLQ queue).