1

Spark dataframe, df, has the following column names:

scala> df.columns
res6: Array[String] = Array(Age, Job, Marital, Education, Default, Balance,     
Housing, Loan, Contact, Day, Month, Duration, Campaign, pdays, previous,   
poutcome, Approved)

And sql query on df by column names works fine:

scala> spark.sql(""" select Age from df limit 2 """).show()
+---+
|Age|
+---+
| 30|
| 33|
+---+

But when I try to use withColumn on df I run into problems:

scala> val dfTemp = df.withColumn("temp", df.Age.cast(DoubleType))
.drop("Age").withColumnRenamed("temp", "Age")
<console>:38: error: value Age is not a member of   
org.apache.spark.sql.DataFrame

Above code is taken from here.

Thanks

shanlodh
  • 1,015
  • 2
  • 11
  • 30

1 Answers1

1

df.Age is not a valid way of calling a column from a dataframe. the correct way is

val dfTemp = df.withColumn("temp", df("Age").cast(DoubleType))

or you can do

val dfTemp = df.withColumn("temp", df.col("Age").cast(DoubleType))

or you can do

import org.apache.spark.sql.functions.col
val dfTemp = df.withColumn("temp", col("Age").cast(DoubleType))

Note: df.withColumn("temp", df.Age.cast(DoubleType())) is valid in pyspark

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97