0

I managed to connect to cloud sql via JDBCIO

DataSourceConfiguration.create("com.mysql.jdbc.Driver","jdbc:mysql://google/?cloudSqlInstance=::&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=&password=")

This works, however, the batch writes takes between 2-5 minutes for 1000 records, which is terrible. i have tried different networks to see if this was related, and the results were consistent.

Anyone have any ideas?

MTS57017
  • 35
  • 1
  • 8
  • i am curious if it has anything to do with the proxy that dataflow creates to facilitate communication to the instance.....? – MTS57017 Sep 21 '18 at 13:59

1 Answers1

0

Where are you initializing this connection? If you are doing this inside of your DoFn it will create latency as the socket is built up and torn down on each bundle.

Have a look at DoFn.Setup this provides a clean way to init resources that will be persisted across bundle calls.

Eric Schmidt
  • 1,257
  • 12
  • 12
  • thanks for the reply. here is the chunk in question.... p.apply("Reading from file for cloud sql", TextIO.read().from("")) .apply("Write to cloud sql", JdbcIO.write().withDataSourceConfiguration(DataSourceConfiguration.create("com.mysql.jdbc.Driver","jdbc:mysql://google/?cloudSqlInstance=::&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=&password=")).withStatement("(INSERT STATEMENT) values()") .withPreparedStatementSetter((element, query) -> { try { // a few setString commands here } catch(Exception e) { LOG.warn(e.getMessage()); } }) ); – MTS57017 Sep 21 '18 at 16:50
  • Are you saying the issue is the above setup? – MTS57017 Sep 21 '18 at 16:50
  • Are you using local runner (DirectRunner) or the DataflowRunner? And version of Beam are you using? – Eric Schmidt Sep 21 '18 at 19:07
  • i am using DirectRunner currently, although i have tried both, and same issue. – MTS57017 Sep 21 '18 at 19:57
  • version is 2.5.0 of Beam APis – MTS57017 Sep 21 '18 at 19:57
  • Hmm - this is odd. Can you re-run the job on the DataflowRunner and post a job id? – Eric Schmidt Sep 22 '18 at 13:49
  • Absolutely. I’ll do it now. Give me about 15 min – MTS57017 Sep 22 '18 at 13:53
  • job id = 2018-09-22_07_11_24-12308066841020523793 i'll leave it running, let me know when i can kill it – MTS57017 Sep 22 '18 at 14:12
  • according to the job, it took over 3 hours to write 99,000 records, and it is still not done yet (there is around 110,000 records if i recall).....see anything out of the norm? – MTS57017 Sep 22 '18 at 18:05
  • update: it ended up being over 5 hours and some change before i killed it. – MTS57017 Sep 23 '18 at 01:46
  • Side note, I tried Spanner (just to see if there were similar issues) and had no issues. Spanner wrote 130000 records in around 2-5 seconds. – MTS57017 Sep 23 '18 at 03:14
  • I created a request to have someone look at this in more detail. Please ping this thread if you don't see an update in the next few days. – Eric Schmidt Oct 15 '18 at 15:59
  • Two things to investigate: 1) There are ton of There is a lot of "java.lang.RuntimeException: java.sql.SQLException: No value specified for parameter 1" exceptions in the job logs. Can you look at your logs and see where this might be surfacing? 2) Can you log the steps in your DoFn.Setup() - can you confirm that you are not doing any custom setup? – Eric Schmidt Oct 16 '18 at 15:53
  • 1
    I don't think we have tested the JDBCIO -> Cloud SQL connection properly so there can be performance issues there. I would also check the error Eric mentioned earlier. Currently JDBCIO is configured to push batches of 1K records. – chamikara Oct 17 '18 at 18:57
  • Probably related - https://stackoverflow.com/questions/8182880/sqlexception-no-value-specified-for-parameter-1. – chamikara Oct 20 '18 at 00:37
  • Feel free to create a Beam JIRA if you think the issue is in Beam code. – chamikara Oct 20 '18 at 00:38
  • Interesting. Thanks will test Eric. Even when testing locally, it still take the same amount of time, very slow. I’ll go ahead and file a ticket. Where can I do this? – MTS57017 Oct 21 '18 at 01:44
  • Link for creating a JIRA here. You can also try to reach the community through dev list or Slack. – chamikara Oct 23 '18 at 17:56
  • i need to test this on Beam 2.8 to see if this is any better. I had to move on and make progress in other areas. It doesn't look like i can log issues either via Jira. Communicating via Slack was not very helpful, as it seems rare to get a response. – MTS57017 Jan 02 '19 at 15:22