3

I'm having a very strange problem with CosmosDB & Azure Functions. I frequently delete my database and re-create it in DEV. I then re-deploy the function app. When I call the APIs in the app and CosmosDB triggers are invoked, I normally see the leases collection created. Here's a typical trigger:

[FunctionName("MyTrigger")]
public static async Task RunAsync([CosmosDBTrigger("MyDatabase", "MyContainer",
ConnectionStringSetting = "CosmosConnectionString", LeaseCollectionName = "leases", 
LeaseCollectionPrefix = "MyTrigger", CreateLeaseCollectionIfNotExists = true)]IReadOnlyList<Document> documents, 
ExecutionContext executionContext)
{
     // code
}

For some reason, the leases collection is no longer being created. I re-created the database, re-deployed the function app multiple times and made API calls with no luck. What am I missing?

EDIT: I looked at the logs and noticed there are a lot of Microsoft.Azure.Documents.ChangeFeedProcessor.Exceptions.LeaseLostException exceptions with The lease was lost message, so I'm not sure what's going on.

EDIT2: Here's a more detailed error message I was able to extract from the logs:

"Either the source collection 'MyContainer' (in database 'MyDatabase') or the lease collection 'leases' (in database 'MyDatabase') does not exist. Both collections must exist before the listener starts. To automatically create the lease collection, set 'CreateLeaseCollectionIfNotExists' to 'true'

Note that CreateLeaseCollectionIfNotExists is already set to true.

user246392
  • 2,661
  • 11
  • 54
  • 96
  • Can you show which is the version of the Functions Cosmos DB extension package you are using? – Matias Quaranta May 08 '20 at 19:13
  • Also, for the lease collection to be created the Function runtime needs to be able to initialize correctly, please check your logs to see if there are errors on runtime stopping the initialization. – Matias Quaranta May 08 '20 at 19:14
  • I'm using Microsoft.Azure.WebJobs.Extensions.CosmosDB 3.0.7 and Microsoft.Azure.Cosmos 3.9.0-preview and Microsoft.NET.Sdk.Functions 3.0.3. – user246392 May 08 '20 at 19:34
  • Where do I find the Function runtime logs? – user246392 May 08 '20 at 19:34
  • They would appear on the App Insights (if you are using that) https://learn.microsoft.com/en-us/azure/azure-functions/functions-monitoring?tabs=cmd. Also, if you browse on the Azure Portal to the Function, I believe there is a Logs pull-up option at the bottom of the screen. Have you tried running the Function locally to debug any initialization errors? – Matias Quaranta May 08 '20 at 19:49
  • Thanks. I see a lot of exceptions in the Logs panel. It's in the format of `The listener for function '{triggerName}' was unable to start.` I also see `The lease was lost.` message. I haven't made code changes that would affect anything, so I'm not sure what local debugging would give me. This issue has happened in the past but usually clears after waiting for a few hours. – user246392 May 08 '20 at 19:54
  • The listener for function '{triggerName}' was unable to start. <--- This is the one that should have the reason. If the Function cannot start, the logic to create the leases collection won't kick in. This should have an Exception. Please share that one. `lease was lost` is a transient state, not important. Another point is: Are you using Another Azure Function with the same leases collection? They both might be trying to start/use it? – Matias Quaranta May 08 '20 at 22:16
  • All of my functions listen to distinct collections, so no to your last question. I added more details to the question. Feel free to send me your email so I can provide the full logs. I was able to get around this issue by restarting the function app, but this issue has happened many times in the past that I'd like to see a more reliable system. – user246392 May 08 '20 at 22:38

1 Answers1

4

Either the source collection... error comes from here: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/0683d1bd08a16680c70f982ad00c940b7e9c1fce/src/WebJobs.Extensions.CosmosDB/Trigger/CosmosDBTriggerListener.cs#L140 which reacts on a NotFound being detected while trying to start the Trigger process.

The key here is understanding that the Lease Collection creation happens during Function initialization, not if the Function is running.

If you delete the lease collection (or the monitored collection) while the Function is running, you might see that error pop, produced by the running instances. If a new instance comes up (due to scaling) or you restart the Function, then the creation kicks in in https://github.com/Azure/azure-webjobs-sdk-extensions/blob/0683d1bd08a16680c70f982ad00c940b7e9c1fce/src/WebJobs.Extensions.CosmosDB/Trigger/CosmosDBTriggerAttributeBindingProvider.cs#L155.

So, when do these errors happen?

  1. Function initialization -> CreateIfNotExist checks and creates Leases collection. If this fails, then initialization stops here. This produces an error message.
  2. Function running -> Instances can be running and if the lease is deleted runtime errors will make the Function code to retry to Start the process again, since the retry does not run the initialization again, it outputs the Either the source collection...
  3. Occasional The lease was lost occurs in load balancing scenarios where multiple Function instances are running and distributing scaled load when a lease (from the lease collection) is distributed to a new instance. This can also happen if the Trigger tried to update the checkpoint and you suddenly deleted the lease collection.

What you can do

If you are manually deleting the leases collection, then you are in control of what can happen. The recommendation is:

  1. stop your Functions
  2. Delete the leases collection
  3. Start your Functions.

The behavior of the Function if you don't stop it and if you delete the lease store while it's running is totally undefined.

Matias Quaranta
  • 13,907
  • 1
  • 22
  • 47
  • Wow. This is a great answer. It helped clarify a lot of things for me. I will follow your recommendations. Can you also briefly talk about what happens when a new collection and trigger are created while the functions app is running? In production, I won't be deleting collections, but I might add new ones. How can I ensure the triggers will work for those collections once I re-deploy the functions app? – user246392 May 08 '20 at 23:28
  • The key is how do you deploy the new Trigger. If the Trigger is being added as part of a Function App project that already exist, you are then deploying the new bits (with the new Trigger). This will cause the Function App to restart. The only effect is that the other Triggers will stop consuming while the Function App restarts, and then resume from the last point where they were working (you won't lose changes because the lease store contains the checkpoints) – Matias Quaranta May 11 '20 at 14:54
  • @MatiasQuaranta Thanks for this awesome explanation, but as a developer, I just want my function to be triggered, I don't mind managing the lease collection, is that something you're planning on your roadmap with the Function App team ? A simple solution could be cleaning up the lease collection on Function App init and creating fresh documents – Mehdi Benmoha Feb 09 '21 at 11:17
  • @MehdiBenmoha not sure I follow. The scenario in this question is that the user was deleting the lease collection while the function was running. If you don't plan to delete any collection, then you'd not run into this issue. The Function Trigger automatically creates the leases collection if it doesn't exist so you don't need to manage it, you just need to not delete it. – Matias Quaranta Feb 09 '21 at 15:55
  • @MatiasQuaranta It happens everytime I use Ctrl+C to stop the function from my VSCode terminal or if the connection to CosmosDB broke in some other way(not sure), and it happens only for a function that's using shared code from another directory. I dont know if it's related to the issue, but I have two functions, the one that is not using the shared code is still triggered. – Mehdi Benmoha Feb 09 '21 at 16:26
  • @MatiasQuaranta I created a new post to make it more readable, and I think my issue is not really related to the OP's one: https://stackoverflow.com/questions/66123208/azure-function-app-cosmos-db-trigger-connection-drop – Mehdi Benmoha Feb 09 '21 at 16:40