Spark- Ambiguous Column Error

Question

I'm trying to build some skills in Spark Sql, mainly in DataFrames. So what I'm trying to do is

join two DataFrames
apply a udf that takes a tuple of columns
separate the resulting tuple into two columns

The code I'm using is

import org.apache.spark.sql.functions.udf

val profDF = Seq((1,"James","Detective"),
                 (2,"Harvey","Captain"),
                 (3,"Barbara","Club Owner")).toDF("ID", "Name", "Occ")
val persDF = Seq((1, 30, "Single"),
                 (2, 35, "Married"),
                 (3, 30, "Single")).toDF("ID", "Age", "Status")

val upperAdd2:(String, Int)=>(String, Int) = (name, age) => (name.toUpperCase, age + 2)

val upAddUDF = udf(upperAdd2)

profDF.join(persDF, profDF("ID")===persDF("ID"))
    .withColumn("Result", upAdUDF(profDF("Name"),persDF("Age")))
    .withColumn("Caps Name", col("Result._1"))
    .withColumn("More Age", col("Result._2"))
    .drop("Result")
    .show

I get the following error after executing the last statement

org.apache.spark.sql.AnalysisException: Reference 'ID' is ambiguous, could be: ID#4, ID#12.;

I think I've properly specified the DataFrames for "ID" because the following statement works just fine

profDF.join(persDF, profDF("ID")===persDF("ID"))
    .withColumn("Result", upAdUDF(profDF("Name"),persDF("Age")))
    .show

Is there anything else I need to specify here?

I'm using Spark 1.6.0 and the code is only for learning purposes. Please let me know if it can be improved further.

Neatly written, but it's a duplicate. Also, why are you returning a tuple from a UDF only to separate it later? — philantrovert, Jan 22 '18 at 13:08
@philantrovert It's just for practice- passing two columns to a UDF and separating a tuple — Amber, Jan 22 '18 at 13:09
@user8371915An answe there did solve my problem. Thanks so much — Amber, Jan 22 '18 at 13:14

Spark- Ambiguous Column Error

0 Answers0