We followed the Cloud Profiler documentation to enable the Cloud Profiler for our Dataflow jobs and the Profiler is failing to start.
The issue is, Cloud Profiler needs JOB_NAME
and JOB_ID
environment vars to start but the worker VM has only the JOB_ID
env var but the JOB_NAME
is missing.
The question is why the JOB_NAME
env var is missing?
Logs:
jsonPayload: {
job: "2022-09-16 13 41 20-1177626142222241340"
logger: "/us/local/lib/pvthon3.9/site-packages/apache_beam/runners/worker/sdk_worker_main.pv:177"
message: "Unable to start google cloud profiler due to error: Unable to find the job id or job name from envvar"
portability_worker_1d: "sdk-0-13"
thread: "MainThread"
worker: "description-embeddings-20-09161341-k27g-harness-qxq2"
}
Following done so far:
Cloud Profiler API enabled for the project
Projects have enough quota.
the Service Account for the Dataflow job has appropriate permissions for Profiler.
Following options added to the pipeline
--dataflow_service_options=enable_google_cloud_profiler
enable_google_cloud_profiler
andenable_google_cloud_heap_sampling
flags specified as additional experiments to deploy our pipeline from Dataflow templates.
Edit: Found the cause.
The provisioning API is returning an empty JOB_NAME
, causing boot.go to set the JOB_NAME
env var to "", which causes the Python SDK code to fail when trying to activate googlecloudprofiler.
There is an open issue on IssueTracker regarding this.