2

I have a Java snippet that reads records from a remote Oracle DB (atleast 65k records). Essentially, we are trying to pass the hourly filter to the dataframe to fetch the records, on an hourly partition x 24.

The source view is based on a table with millions of records.

The problem we are facing is that, Spark (on YARN or as a SPARK cluster) processes 22 out of 24 partitions in under 3 mins. The last 2 partitions are taking more than 5 hours to complete.

Is there any way we can speed this up using DataFrames ?

HashMap<String, String> options = new HashMap<>();
sqlContext.setConf("spark.sql.shuffle.partition", "50");
options.put("dbtable", "( select * from "+VIEW_NAME+" where 1=1)");
options.put("driver", "oracle.jdbc.OracleDriver");
options.put("url", JDBC_URL);
options.put("partitionColumn", "hrs");
options.put("lowerBound", "00");
options.put("upperBound", "23");
options.put("numPartitions", "24");

DataFrame dk = sqlContext.load("jdbc", options).cache();   
dk.registerTempTable(VIEW_NAME);
dk.printSchema();
DateTime dt = new DateTime(2015, 5, 8, 10, 0, 0);
String s = SQL_DATE_FORMATTER.print(dt);
dt = dt.plusHours(24);
String t = SQL_DATE_FORMATTER.print(dt);
System.out.println("S is " + s + "and t is "+ t);
Stream<Row> rows = dk.filter("DATETIME >= '" + s + "' and DATETIME <= '" + t + "'").collectAsList().parallelStream();
    System.out.println("Collected" + rows.count());
RvK
  • 234
  • 2
  • 6
  • Any updates on this one? Ever found a fix? – zengr Sep 13 '15 at 07:26
  • Nope. No updates, But I did find a thing, It is performing 00 time partition to 23 time partition and then doing a single partition M/R for all the time (00-23) hence it was not working. – RvK Sep 15 '15 at 21:58
  • As a work around, we should change dt.plusHours(24).minusSecond(1) – RvK Sep 15 '15 at 22:00

1 Answers1

0

Not Sure if this is an answer in complete, but as a work around, if we do the following

dt = dt.plusHours(24).minusSeconds(1)

It is faster, but still not a as fast as First 23 partitions

RvK
  • 234
  • 2
  • 6