39

AWS Glue jobs log output and errors to two different CloudWatch logs, /aws-glue/jobs/error and /aws-glue/jobs/output by default. When I include print() statements in my scripts for debugging, they get written to the error log (/aws-glue/jobs/error).

I have tried using:

log4jLogger = sparkContext._jvm.org.apache.log4j 
log = log4jLogger.LogManager.getLogger(__name__) 
log.warn("Hello World!")

but "Hello World!" doesn't show up in either of the logs for the test job I ran.

Does anyone know how to go about writing debug log statements to the output log (/aws-glue/jobs/output)?

TIA!

EDIT:

It turns out the above actually does work. What was happening was that I was running the job in the AWS Glue Script editor window which captures Command-F key combinations and only searches in the current script. So when I tried to search within the page for the logging output it seemed as if it hadn't been logged.

NOTE: I did discover through testing the first responder's suggestion that AWS Glue scripts don't seem to output any log message with a level less than WARN!

vahdet
  • 6,357
  • 9
  • 51
  • 106
Jesse Clark
  • 1,150
  • 2
  • 13
  • 15
  • 1
    Do you need to import anything to use `log4jLogger`?Somehow adding these three lines to my script, my job hangs there. The status shows `running` but no log is generated – cozyss Jul 24 '18 at 18:37
  • This does not work for me in the Glue Job. I am outputting WARN level logs and can not see the min Cloud Watch. Is there anything else you needed to get it working? Thanks – padr Sep 06 '18 at 13:43
  • @padr I had the same problem. When you view the logs, you need to search for the log text in the **filter event** search box. log some nonsense text that will not appear in any other log records to test this. – Arran Duff Aug 01 '19 at 13:49

7 Answers7

37

I know the article is not new but maybe it could be helpful for someone: For me logging in glue works with the following lines of code:

# create glue context
glueContext = GlueContext(sc)
# set custom logging on
logger = glueContext.get_logger()
...
#write into the log file with:
logger.info("s3_key:" + your_value)
Lars
  • 371
  • 3
  • 2
  • what is this s3 key means here? @Lars, is it possible to write the error messages to a file in s3? – anidev711 May 05 '20 at 05:51
  • Official documentation on the subject https://docs.aws.amazon.com/glue/latest/dg/monitor-continuous-logging-enable.html – selle May 14 '20 at 13:49
  • 1
    Couple of things to note: 1. Glue logger does not take msg format strings, instead it expects full strings (so you have to handle the arguments). 2. Glue logger doesn't seem to be able to be broadcasted out to workers, so if you're trying to log from UDFs you'll need to use the Python logger. – aiguofer Jun 01 '20 at 18:22
  • What if I want to print out an intermediate data value such as the input data so that I can debug? I used `logger.info(input_data)` seems not working.. – wawawa Apr 15 '21 at 12:47
  • @anidev711 the s3 key here is just an example of the content of a log message. You pass whatever you want in the contents of your logs to `logger.info()` – falsePockets May 19 '22 at 00:48
33

Try to use built-in python logger from logging module, by default it writes messages to standard output stream.

import logging

MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s'
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT)
logger = logging.getLogger(<logger-name-here>)

logger.setLevel(logging.INFO)

...

logger.info("Test log message")
Alexey Bakulin
  • 1,229
  • 2
  • 13
  • 15
  • 3
    Turns out the way I was originally trying to log works too. I also discovered that AWS Glue pyspark scripts won't output anything less than a WARN level (see edits above). I'll accept your answer since it works too. Thanks! – Jesse Clark Feb 26 '18 at 17:58
  • 2
    What "" i write to do the cloudwatch see my log? – Marcel Bezerra Feb 21 '19 at 18:17
  • Any meaningful string you want, for ex. application name. This value will be used in place of `%(name)s` in a log message. – Alexey Bakulin Feb 22 '19 at 08:19
  • Is it possible to write only the custom messages to s3? – anidev711 May 05 '20 at 05:52
  • Hi I have small question, logging.basicConfig(filename='s3:///spark.logs',level=logging.INFO) Can i store log inso into s3 bucket I tired by above config, it didnt work @AlexeyBakulin – JP Jack Jul 02 '20 at 07:18
  • 1
    What if I want to print out an intermediate data value such as the input data so that I can debug? I used `logger.info(input_data)` seems not working.. – wawawa Apr 15 '21 at 12:47
9

I noticed the above answers are written in python. For Scala you could do the following

import com.amazonaws.services.glue.log.GlueLogger

object GlueApp {
  def main(sysArgs: Array[String]) {
    val logger = new GlueLogger
    logger.info("info message")
    logger.warn("warn message")
    logger.error("error message")
  }
}

You can find both Python and Scala solution from official doc here

RobotCharlie
  • 1,180
  • 15
  • 19
4

Just in case this helps. This works to change the log level.

sc = SparkContext()
sc.setLogLevel('DEBUG')
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
logger.info('Hello Glue')
Simon77
  • 366
  • 4
  • 4
2

This worked for INFO level in a Glue Python job:

import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
root.info("check")

source

Masi N.
  • 31
  • 2
1

I faced the same problem. I resolved it by added logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))

Before there was no prints at all, even ERROR level

The idea was taken from here https://medium.com/tieto-developers/how-to-do-application-logging-in-aws-745114ac6eb7

Another option would be to log to stdout and glue AWS logging to stdout (using stdout is actually one of the best practices in cloud logging).

Update: it works only for setLevel("WARNING") and when prints ERROR or WARING. I didn't find how to manage it for the INFO level :(

feechka
  • 205
  • 6
  • 16
0

If you're just debugging, print() (Python) or println() (Scala) works just fine.

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
  • `print()` works, kind of. But all `print()` statements land in a single line in the Glue log which is not ideal. – Piotr L Mar 15 '23 at 18:28