1

I am running the following code on AWS EMR:

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .appName("PythonPi")\
    .getOrCreate()
sc = spark.sparkContext

def f(_):
    print("executor running") # <= I can not find this output
    return 1

from operator import add
output = sc.parallelize(range(1, 3), 2).map(f).reduce(add)
print(output) # <= I found this output
spark.stop()

I am recording logs to s3 (Log URI is s3://brand17-logs/).

I can see output from master node here:

s3://brand17-logs/j-20H1NGEP519IG/containers/application_1618292556240_0001/container_1618292556240_0001_01_000001/stdout.gz

Where can I see output from executor node ?

I see this output when running locally.

Andrey
  • 5,932
  • 3
  • 17
  • 35

1 Answers1

2

You are almost there while browsing the log files.

The general convention of the stored log is something like this: Inside the containers path where there are multiple application_id, the first one(something like this application_1618292556240_0001 ending with 001) will be of the driver node and the rest will be from the executor.

I have no official documentation where it is mentioned above. But I have seen this in all my clusters.

So if you browse to the other application id, you will be able to see the executor log file.

Having said that it is very painful to browse to so many executors and search for the log.

How do I personally see the log from EMR cluster:

  1. log in to one of the EC2 instance having enough access to download the files from S3 where the log of EMR is getting saved.

  2. Navigate to the right path on the instance.

    mkdir -p /tmp/debug-log/ && cd /tmp/debug-log/

  3. Download all the files from S3 in a recursive manner.

    aws s3 cp --recursive s3://your-bucket-name/cluster-id/ .

In your case, it would be

`aws s3 cp --recursive s3://brand17-logs/j-20H1NGEP519IG/ .`
  1. Uncompress the log file:

    find . -type f -exec gunzip {} \;

Now that all the compressed files are uncompressed, we can do a recursive grep like below:

  1. grep -inR "message-that-i-am-looking-for"

the flag with grep means the following:

i -> case insensitive
n -> will display the file and line number where the message is present
R -> search it in a recursive manner.
  1. Browse to the exact file by vi pointed by the above grep command and see the more relevant log in that file.

More readings can be found here:

View Log Files

access spark log

Ajay Kr Choudhary
  • 1,304
  • 1
  • 14
  • 23
  • there are three files in another container: `prelaunch.out.gz`, `stderr.gz`, `stdout.gz`. And there is no line `executor running` within these files – Andrey Apr 13 '21 at 14:24
  • unfortunately I am not able to connect to EC2 with ssh: https://superuser.com/questions/1640923/can-not-connect-to-amazon-emr-claster-with-putty – Andrey Apr 13 '21 at 14:26
  • ahh. I don't know how to solve the second problem. :( How many containers are there in total for you. is it only one having the above 3 files. I suspect one more thing, the simple print statement is not getting saved in S3, you have to use some logger for this, and then while launching the EMR specify the logger properties. – Ajay Kr Choudhary Apr 13 '21 at 14:34
  • https://stackoverflow.com/questions/42616021/aws-emr-spark-python-logging please see this if it helps. then answer by author braj.. he is redirecting specifically to a file. – Ajay Kr Choudhary Apr 13 '21 at 14:36
  • You don't have to necessarily log in to the instance created by EMR to view the log. It can be downloaded from any EC2 Instance having the relavent access to download file from S3. – Ajay Kr Choudhary Apr 13 '21 at 15:29
  • this post is about output from master. I tried braj's approach (I added `> s3://brand17-stock-prediction/log.out 2>&1 &` as step argument) - it doesn't work: there is no `log.out` file in the bucket. I have two containers: master and executor with three files in the both. Could you please explain - how can I download from EC2 instance without connecting to it – Andrey Apr 14 '21 at 10:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231120/discussion-between-ajay-kr-choudhary-and-andrey). – Ajay Kr Choudhary Apr 14 '21 at 13:13