1

I delete and re-submit a job with the same name, and I often get a 409 HTTP error with a message that says that the object is being deleted -- my submit comes before the job object is removed.

My current solution is to spin-try until I am able to submit a job. I don't like it. This looks quite ugly and I wonder if there's a way to call deletion routine in a way that waits till the object is completely deleted. According to this kubectl waits till the object is actually deleted before returning from delete command. I wonder if there's an option for the Python client.

Here's my spin-submit code (not runnable, sorry):

# Set up client
config.load_kube_config(context=context)
configuration = client.Configuration()
api_client = client.ApiClient(configuration)
batch_api = client.BatchV1Api(api_client)


job = create_job_definition(...)

batch_api.delete_namespaced_job(job.metadata.name, "my-namespace")
for _ in range(50):
    try:
        return batch_api.create_namespaced_job(self.namespace, job)
    except kubernetes.client.rest.ApiException as e:
        body = json.loads(e.body)
        job_is_being_deleted = body["message"].startswith("object is being deleted")
        if not job_is_being_deleted:
            raise
    time.sleep(0.05)

I wish it was

batch_api.delete_namespaced_job(job.metadata.name, "my-namespace", wait=True)
batch_api.create_namespaced_job(self.namespace, job)

I have found a similar question, and the answer suggests to use watch, which means I need to start a watch in a separate thread, issue delete command, join the thread that waits till the deletion is confirmed by the watch -- seems like a lot of code for such a thing.

Wytrzymały Wiktor
  • 11,492
  • 5
  • 29
  • 37
Anton Daneyko
  • 6,528
  • 5
  • 31
  • 59
  • doesn't adding a thread seem unnecessary in this case unless you have other actions since the watch thread will anyway have to wait for the watch to end before proceeding with the main thread? – Krishna Chaurasia May 26 '21 at 05:50
  • @KrishnaChaurasia The doc says "Clusters using etcd3 preserve changes in the last 5 minutes by default.", so in this case you're right and it could work if I issue a deletion request and do`for e in Watch().stream(..., timeout_seconds=1): break_as_soon_as_deletion_event_encountered()`. But as an application developer I don't know what are the cluster settings and I am not sure I can rely on events being available after I `Watch().stream()`, so I thought I need to start the watch so it works in parallel with the deletion request. – Anton Daneyko May 27 '21 at 16:12
  • My code used to hash some job parameters and use it as a job name in order to be able to map a set of parameter to a job later. If the job was invoked many times it would name-clash. I ended up re-writing the scheduling so that the job names are unique and the hash of the job parameters is stored in job's metadata. This way I don't have a name clash any more and don't actually need to delete a job before scheduling a new one. – Anton Daneyko May 27 '21 at 16:16

1 Answers1

1

As you have already mentioned, kubectl delete has the --wait flag that does this exact job and is true by default.

Let's have a look at the code and see how kubectl implements this. Source.

waitOptions := cmdwait.WaitOptions{
    ResourceFinder: genericclioptions.ResourceFinderForResult(resource.InfoListVisitor(deletedInfos)),
    UIDMap:         uidMap,
    DynamicClient:  o.DynamicClient,
    Timeout:        effectiveTimeout,

    Printer:     printers.NewDiscardingPrinter(),
    ConditionFn: cmdwait.IsDeleted,
    IOStreams:   o.IOStreams,
}
err = waitOptions.RunWait()

Additionally here are RunWait() and IsDeleted() function definitions.

Now answering your question:

[...] which means I need to start a watch in a separate thread, issue delete command, join the thread that waits till the deletion is confirmed by the watch -- seems like a lot of code for such a thing

It does look like so - it's a lot of code, but I don't see any alternative. If you want to wait for deletion to finish you need to do it manually. There does not seems to be any other way around it.

Matt
  • 7,419
  • 1
  • 11
  • 22