0

I try to run a script from Google CPB100 - Lab3b (train_and_apply.py) with dataproc against SLQ (mysql ddbb) but I get a timeout.

Caused by: java.net.ConnectException: Connection timed out (Connection timed out)

From the dataproc master I can connect with the mysql command line, but no with the python commands from the script. What can I do to diagnostic this issue?

Success

$> mysql --host=35.194.7.XXX --user=root --password 

Timeout

$> pyspark

%> jdbcDriver='com.mysql.jdbc.Driver'
%> jdbcUrl='jdbc:mysql://35.194.7.XXX:3306/recommendation_spark?user=root&password=XXXX'
%> dfRates = sqlContext.read.format('jdbc').options(driver=jdbcDriver, url=jdbcUrl, dbtable='Rating').load()
Seguy
  • 56
  • 5

1 Answers1

1

I'm not sure what is wrong based on your question, but I would recommend editing the log4j config as described in this StackOverflow post to see if there are important info or debug logs under com.mysql or org.apache.spark.sql.jdbc.

Patrick Clay
  • 1,339
  • 7
  • 5
  • Hi Patrick, putting a higher log level doesn't help me. I get more verbosity with the cluster part 'org.apache.hadoop' but the same on the sql part 'com.mysql.jdbc' that raises the error. [Code added in the script "sc.setLogLevel("ALL")"] Thanks anyway, – Seguy Nov 08 '17 at 09:11