How to fix "CommunicationsException: Communications link failure"?

Asked Feb 26 '21 at 12:31

Active Feb 26 '21 at 17:56

Viewed 116 times

I am trying to export a large mysql table having 350M records to parquet files in s3.

Following is the code I had tried:

    df = sparkSession.read.format('jdbc').options(
        url=db_url,
        driver='com.mysql.jdbc.Driver',
        dbtable='table_name',
        user=db_user,
        password=db_pwd,
        partitioncolumn='id',
        lowerbound=0,
        upperbound=1000000,
        numpartitions=10
    ).load()

df.write.parquet(output_path, mode='overwrite')

It runs for 25 mins on an EMR cluster with r5.2xlarge instances(1 Master, 10 Core and 10 Task) and it tails with error com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure.

Earlier I tried without numpartitions, lowerbound, upperbound and partitioncolumn options. That time too I got the same error. Based on similar issues reported before on StackOverflow, I tried with mentioned options and still the error exists.

Any help would be highly appreciated.

edited Feb 26 '21 at 17:56

Jacek Laskowski

72,696
27
242
420

asked Feb 26 '21 at 12:31

sufinsha

How to fix "CommunicationsException: Communications link failure"?

0 Answers0