0

Hi Im trying the following operation in Scala:

I have 2 dataframes. I want to compare their columns names and then column types. I started by extracting the column names Then I sorted the array and finally printed it

val df1colArr = df1.dtypes

val df2colArr = df2.dtypes


Sorting.quickSort(df1colArr)
Sorting.quickSort(df2colArr)


println(df1colArr.deep.mkString("\n"))
println(df2colArr.deep.mkString("\n"))

The output looks like this:

(age,IntegerType)
(color,StringType)
(dealer_id,StringType)
(first_name,StringType)
(id,IntegerType)
(last_name,StringType)
(loyalty_score,StringType)
(model,StringType)
(purchase_date,TimestampType)
(purchase_price,StringType)
(rank_dr,IntegerType)
(service_date,TimestampType)
(vin_num,StringType)

(age,IntegerType)
(color,StringType)
(dealer_id,StringType)
(first_name,StringType)
(id,IntegerType)
(last_name,StringType)
(loyalty_score,IntegerType)
(model,StringType)
(purchase_date,TimestampType)
(purchase_price,StringType)
(rank_dr,IntegerType)
(repeat_likely,IntegerType)
(service_date,TimestampType)
(vin_num,StringType)

Next I have a simple utility to compare 2 arrays above based on their value at index 0:

val col_similar: ( Array[(String,String)], Array[(String, String)] )=> String 
= (x,y) => {if (x(0).sameElements(y(0))) "similar" else "different"}

when I run the above code. I get the following error:

Error:(59, 105) value sameElements is not a member of (String, String)
val col_similar: ( Array[(String,String)], Array[(String, String)] ) => String 
= (x,y) => {if (x(0).sameElements(y(0))) "similar" else "different"}

Please help me understand why this code wont work.... Thanks so much

banditKing
  • 33
  • 1
  • 8

1 Answers1

1

x(0) is a pair of strings. If you wanted to compare arrays of pairs x and y, then do it:

if (x sameElements y) ... else ...

By the way, I doubt that this approach will scale to actual datasets - collecting the entire dataframe to the master node is usually a bad idea. Maybe you can find some better ideas here.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93