0

We have a cdap application to connection to phoenix table from spark using phoenix driver. I have the phoenix version 4.7 in our environment. As per the standard spark2 phoenix connectivity, it requires only the phoenix-spark2 as a dependency and all other dependencies will be picked up from the classpath and hbase-site.xml properties.

Now what are the dependencies required by cdap spark phoenix application and how can i use hbase-site.xml with the cadp application to make the successful connection.

ae8
  • 53
  • 1
  • 6

1 Answers1

0

This is an answer for Spark-version and not CDAP, if someone lands here could use this maybe.

I currently use Phoenix version 4.7 and spark version 2.3 in production, I have the following dependencies related to Phoenix in my pom.xml

<phoenix-version>4.7</phoenix-version>

<dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-spark2</artifactId>
        <version>4.7.0.2.6.5.3007-3</version>
        <exclusions>
        <exclusion>
            <groupId>sqlline</groupId>
            <artifactId>sqlline</artifactId>
        </exclusion>
        </exclusions>
</dependency>


<dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-client</artifactId>
        <version>4.14.1-HBase-1.1</version>
</dependency>

Also, say for example I want to retrieve a table from Phoenix into Spark Dataframe I would use the following Spark code:

val sqlContext = spark.sqlContext
val table = sqlContext.load("org.apache.phoenix.spark",
  Map("table" -> s"NAMESPACE.TABLE_NAME",
    "zkUrl" -> zookeeperUrl))

Let me know if this doesn't work out

Anish Nair
  • 79
  • 2
  • 11
  • Thanks Anish for the reply. I have tried adding the above dependencies and i got the below error "org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Tue Jan 28 09:22:40 BST 2020, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60302: ". By the way, Is the above spark phoenix was tried with CDAP application? – ae8 Feb 04 '20 at 20:28
  • Oh, this was more like a Spark application instead of CDAP. By that error, I am guessing the connection wasn't successful, try to take only one of the zookeeper url's instead of 3. Can you test the connection in any way. By using Spark, we can test the connection in a Spark-shell. Is there anyway you can do the same for a CDAP application? – Anish Nair Feb 04 '20 at 20:39
  • The connection was not successful. I am trying with one zk url instead of 3. The same is working with spark shell. Unfortunately we dont have any such method to test the connection. – ae8 Feb 05 '20 at 06:34
  • Is this something similar to the post ? https://stackoverflow.com/questions/48219169/3600-seconds-timeout-that-spark-worker-communicating-with-spark-driver-in-heartb – ae8 Feb 05 '20 at 07:07
  • That post is more about Spark, but the issue you are seeing is with Phoenix. How big is the table? – Anish Nair Feb 05 '20 at 18:24
  • I am trying with small table which has less than 1000 records. Only the issue is with the connection. I have tried bundling the jar with phoenix-client 4.7 version but if fails to bundle it because it has been patched. – ae8 Feb 05 '20 at 21:16
  • hmm so its not even timeout. How much of an effort is it to move from CDAP to Spark? – Anish Nair Feb 05 '20 at 22:03
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/207312/discussion-between-anish-nair-and-ae8). – Anish Nair Feb 05 '20 at 22:14