6

I'm querying a Spark's database table using Spark 3.x SQL in Scala 2.12. I've followed the examples given in the Internet.

The db I'm using: Spark SQL's database and using Centos 7. The table (example) I'm querying has the following columns:

create table example( tutorial_title VARCHAR(22) NOT NULL) ;
var example= spark.read.format("jdbc")
.option("url", "jdbc:hive2://localhost:10000/test2")
.option("dbtable", "example")
.option("user", "username")
.option("password", "123456")
.option("fetchsize", "10")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.load()

This gives me the following output:

+-------+-------
|tutorial_title|
+-------+-------
|tutorial_title|
|tutorial_title|
|tutorial_title|
+-------+-------

i.e. repeats the column names for each row without giving the data. My table has 3 rows. I've tried changing the number of rows in db and my output changes accordingly.

If I use ./bin/spark-sql and select the table, its shows me the actual records. But ./bin/Spark-shell give the column names as results/records.

spark-sql and beeline client having the correct records But Spark's read.format("jdbc") and Spark-shell is having the above wrong records.

  • the create table has one column but in the response, you have 3 columns. Also, did you create the table in the default schema or custom schema? – dassum Jul 30 '20 at 17:02
  • 1
    a year back , I got the same header duplicate issue with Impala & spark, we fixed it changing the way of reading the data from Impala driver to hive driver read of impala tables. can you share the hive version, hive driver version are you using? – sathya Jul 30 '20 at 18:09
  • Sorry, I added different table output. I have a single column and single column as output. I created the table in test2 schema, which I am able to access using squirrel SQL client. – Bharat Bhushan Jul 31 '20 at 06:26
  • As a overall process, I downloaded Spark 3.x, extracted it and started Spark thrift server. Then I connected using beeline and created my table in test2 schema. As of this process, I can see Hive 2.3.7 jars in spark folder. – Bharat Bhushan Jul 31 '20 at 06:28
  • I got the answers for this, Its a Hive driver issue and we can use Cloudera's Hive Driver. Check these URLs. https://stackoverflow.com/questions/48087779/sqlcontext-hivedriver-error-on-sqlexception-method-not-supported – Bharat Bhushan Jul 31 '20 at 08:14
  • https://community.cloudera.com/t5/Support-Questions/SQL-Query-Failed-with-Cloudera-Hive-JDBC-driver-but-works/m-p/63211/highlight/true#M4343 – Bharat Bhushan Jul 31 '20 at 08:15

1 Answers1

1

Hey can you try by giving the table name like this.

var example= spark.read.format("jdbc")
.option("url", "jdbc:hive2://localhost:10000/test2")
.option("dbtable", "test2.example")
.option("user", "username")
.option("password", "123456")
.option("fetchsize", "10")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.load()

Even when you are trying to fetch using query, instead of using

select * from table_name

you should use this

select * from db_name.table_name