3

Whenever I try to use .show() method on the org.apache.spark.sql.DataFrame object, I get

java.sql.SQLException: Cannot convert column 2 to integerjava.lang.NumberFormatException: For input string: "{table_name.column_name}" // here table_name.column_table indicates a placeholder I added for the sake of this question

My SparkSession:

val sparkSession: SparkSession = SparkSession.builder()
    .appName("My sql connector session")
    .master("local")
    .enableHiveSupport()
    .getOrCreate()

My DataFrame loader:

  val currentDF : DataFrameReader = sparkSession.sqlContext.read
                      .format("jdbc")
                      .option("url", hive_url)
                      .option("driver", "org.apache.hive.jdbc.HiveDriver")
                      .option("user", username)
                      .option("inferSchema", "true")
                      .option("password", password)
                      .option("dbtable", `{*table_name*}`).load()

When I execute following code

    currentDF.createOrReplaceTempView("std")
    val newDf = sparkSession.sql("select count(*) as count from std")
    newDf.show()

I get

+-----+
|count|
+-----+
|  726|
+-----+

And when I print the schema using currentDF.printSchema(), I have

|-- {table_name}.{column_name}: integer (nullable = true)

Which indicates that currentDF contains data, but whenever I use .show() method I get the above stated error. I search but I was not able to find same error as mine, as I am getting {table_name.column_name} and not any dirty data from the table. I am at my wits end and really confused with this. Any help would be really appreciated.

Edit: Tried to change the schema of df by converting all the columns to string type, still getting the same result.

PS : My build.sbt

name := "sbt-validation"

version := "0.1"

scalaVersion := "2.12.4"

libraryDependencies ++= Seq(
  "com.github.tototoshi" %% "scala-csv" % "1.3.6",
  "io.netty" % "netty-all" % "4.1.42.Final",
  "org.apache.hive" % "hive-jdbc" % "3.0.0",
  "com.lihaoyi" %% "requests" % "0.6.5",
  "mysql" % "mysql-connector-java" % "8.0.15",
  "org.apache.spark" %% "spark-sql" % "3.0.0",
  "org.apache.spark" %% "spark-hive" % "3.0.0",
  "org.apache.spark" %% "spark-core" % "3.0.0"
    exclude(name="ch.qos.logback", org="ch.qos.logback")
)
Sajal
  • 89
  • 1
  • 14

1 Answers1

1

Check below code.

val sparkSession: SparkSession = SparkSession.builder()
    .appName("My sql connector session")
    .master("local")
    .enableHiveSupport()
    .getOrCreate()
val currentDF : DataFrame = sparkSession.sqlContext.read
                      .format("jdbc")
                      .option("url", hive_url)
                      .option("driver", "org.apache.hive.jdbc.HiveDriver")
                      .option("user", username)
                      .option("inferSchema", "true")
                      .option("password", password)
                      .option("dbtable", "(select columna,columnb from schema_name.table_name) alias_name") // Replace your table name & column list as per your requirement.
                      .load()
currentDF.createOrReplaceTempView("std")
val newDf = sparkSession.sql("select count(*) as count from std")
newDf.show()                      
Srinivas
  • 8,957
  • 2
  • 12
  • 26