0

I have four arrays that contain the column names from four data frames.

var col1 = df1.columns
var col2 = df2.columns
var col3 = df3.columns
var col4 = df4.columns

They are all Array[String] . Now the problem is to identify those columns that are commonly occurring in all 4 arrays and those which are not . I guess one can start with thinking of finding the intersection of two Arrays and then loop it. Any ideas ? Can we extend this to N dimensional arrays.

So the idea is not just identify intersection across two arrays but multiple arrays and also identify the difference

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Leothorn
  • 1,345
  • 1
  • 23
  • 45
  • 2
    Not really a duplicate of that question - the OP seems to be interested in intersecting the _column names_, not their actual values – Tzach Zohar Jun 21 '17 at 13:37
  • 2
    Possible duplicate of [Comparing two array columns in Scala Spark](https://stackoverflow.com/questions/44158623/comparing-two-array-columns-in-scala-spark) – jwvh Jun 21 '17 at 19:47

1 Answers1

4

You can create a List of these arrays, and use reduce with the intersect function:

List(col1, col2, col3, col4).reduce((a, b) => a intersect b)
Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85