I've seen several bits on emitting logs from PythonOperator, and for configuring Airflow logs, but haven't found anything that will let me emit logs from within a containerized process, e.g. the DataProcPySparkOperator.
I've gone so far as to including the following at the top of the pyspark script that is run inside the Operator's cluster:
import logging
logging.info('Test bare logger')
for ls in ['airflow', 'airflow.task', __name__]:
l = logging.getLogger(ls)
l.info('Test {} logger'.format(ls))
print('Test print() logging')
It produces no output, although the Operator script otherwise runs as intended.
I assume that I could build a connection to cloud storage (or the DB) from within the cluster, perhaps piggybacking off the existing connection used to read & write files, but ... that seems like a lot of work for a common need. I would very much like to get occasionally-referenced status checks about the number of records or other data at intermediate stages of the computation.
Does Airflow set up a Python logger in the cluster by default? If so, how do I access it?