Can I dynamically alter log levels in Google Dataflow once the job has started?

Question

From https://cloud.google.com/dataflow/docs/guides/logging I am trying to understand if it is possible to reconfigure the log levels after the job and workers have started.

I suspect the answer is "no" but I wanted to see if anyone knows for sure, as the documentation does not specifically address this point.

Facilities such as log4j have the ability to dynamically monitor the logging configuration and act on it, but I don't know if Dataflow supports that.

score 1 · Answer 1 · answered Aug 15 '22 at 18:59

1

Good question, I don't think there is such a way to do that, once the worker has started, the only option is to stop, the pipeline if you see something wrong in the logs. But don't think that supports what you are looking for

answered Aug 15 '22 at 18:59

Chaotic Pechan

866
8
18

We have GCP support... I will see if I can ask them directly... – Eric Kolotyluk Aug 15 '22 at 21:32
Please let us know, wanted to know that too – Chaotic Pechan Aug 15 '22 at 21:34
1

I have sent someone at Google an email, referencing this posting. Hopefully we can all get an answer. :-) – Eric Kolotyluk Aug 16 '22 at 15:01

score 0 · Answer 2 · answered Aug 16 '22 at 12:03

AFAIK, by design, it's not possible. If you imagine the architecture of dataflow, what do you have?

A main server that take your code, compile it, package it, and deploy it on workers (that's why, at the beginning, you have only 1 instance, the main server, and then you have an automatic scalability).

Then, the data are pulled and transformed on the worker. The code is immutable in the workers. The main server will never update that code (except if you perform a roll-out in streaming mode).

Of course, you could imagine that, on a special value read by the worker, the worker update locally its own logs level. But you can assume that all the workers will receive the information (because the data are sharded and each worker see only a subset of the data)

But, at the end of the day, what's your concern? Do you have too much logs? If so, you can use Cloud Logging router to exclude some logs. They won't be ingested and therefore won't be charged

If the logs slow your workload, this time, you have to rethink/redesign your logging strategy and level, before launching your code

When I imagine Dataflow, the workers are just code, in my case, just code running in a JVM. It does not matter that the code is immutable in the workers, it can still check for changes in its environment, and act on those changes, such as changing the logging level. My concern is at runtime, I don't want to reduce logging, I want to increase it for troubleshooting purposes. — Eric Kolotyluk, Aug 16 '22 at 15:00
The main server dispatch code to workers (several thread in several VMs. It's distributed computing, like Hadoop). And then, the code doesn't change. Programmatically, you can change the log level with log4j. As I said, you can implement that in your code. If you read a special data, change the log level. But that will affect only the current worker. If new workers are created, the initial log level will be taken. — guillaume blaquiere, Aug 16 '22 at 16:08

Can I dynamically alter log levels in Google Dataflow once the job has started?

2 Answers2