java.sql.SQLException -> NumberFormatException when using .show() method on DataFrame in spark

Question

Whenever I try to use .show() method on the org.apache.spark.sql.DataFrame object, I get

java.sql.SQLException: Cannot convert column 2 to integerjava.lang.NumberFormatException: For input string: "{table_name.column_name}" // here table_name.column_table indicates a placeholder I added for the sake of this question

My SparkSession:

val sparkSession: SparkSession = SparkSession.builder()
    .appName("My sql connector session")
    .master("local")
    .enableHiveSupport()
    .getOrCreate()

My DataFrame loader:

  val currentDF : DataFrameReader = sparkSession.sqlContext.read
                      .format("jdbc")
                      .option("url", hive_url)
                      .option("driver", "org.apache.hive.jdbc.HiveDriver")
                      .option("user", username)
                      .option("inferSchema", "true")
                      .option("password", password)
                      .option("dbtable", `{*table_name*}`).load()

When I execute following code

    currentDF.createOrReplaceTempView("std")
    val newDf = sparkSession.sql("select count(*) as count from std")
    newDf.show()

I get

+-----+
|count|
+-----+
|  726|
+-----+

And when I print the schema using currentDF.printSchema(), I have

|-- {table_name}.{column_name}: integer (nullable = true)

Which indicates that currentDF contains data, but whenever I use .show() method I get the above stated error. I search but I was not able to find same error as mine, as I am getting {table_name.column_name} and not any dirty data from the table. I am at my wits end and really confused with this. Any help would be really appreciated.

Edit: Tried to change the schema of df by converting all the columns to string type, still getting the same result.

PS : My build.sbt

name := "sbt-validation"

version := "0.1"

scalaVersion := "2.12.4"

libraryDependencies ++= Seq(
  "com.github.tototoshi" %% "scala-csv" % "1.3.6",
  "io.netty" % "netty-all" % "4.1.42.Final",
  "org.apache.hive" % "hive-jdbc" % "3.0.0",
  "com.lihaoyi" %% "requests" % "0.6.5",
  "mysql" % "mysql-connector-java" % "8.0.15",
  "org.apache.spark" %% "spark-sql" % "3.0.0",
  "org.apache.spark" %% "spark-hive" % "3.0.0",
  "org.apache.spark" %% "spark-core" % "3.0.0"
    exclude(name="ch.qos.logback", org="ch.qos.logback")
)

I don't know this tech, but from the error messages and your code I think perhaps might help you - https://stackoverflow.com/questions/47946449/spark-2-2-thrift-server-error-on-dataframe-numberformatexception-when-query-hive — Preet Sangha, Aug 31 '20 at 05:14
I'm almost certain `sparkSession.sqlContext.read.format("jdbc")...load()` will not return a **`DataFrameReader`**. — mazaneicha, Oct 19 '20 at 13:12

score 1 · Answer 1 · answered Oct 20 '20 at 08:08

Check below code.

val sparkSession: SparkSession = SparkSession.builder()
    .appName("My sql connector session")
    .master("local")
    .enableHiveSupport()
    .getOrCreate()

val currentDF : DataFrame = sparkSession.sqlContext.read
                      .format("jdbc")
                      .option("url", hive_url)
                      .option("driver", "org.apache.hive.jdbc.HiveDriver")
                      .option("user", username)
                      .option("inferSchema", "true")
                      .option("password", password)
                      .option("dbtable", "(select columna,columnb from schema_name.table_name) alias_name") // Replace your table name & column list as per your requirement.
                      .load()

currentDF.createOrReplaceTempView("std")
val newDf = sparkSession.sql("select count(*) as count from std")
newDf.show()

java.sql.SQLException -> NumberFormatException when using .show() method on DataFrame in spark

1 Answers1