1

I am new to spark using scala and very much confused by the notations (x,y) in some scenarios and x._1, y._1. Especially when they are used one over the other in spark transformations

could someone explain is there a specific rule of thumb for when to use each of these syntaxes

neil
  • 11
  • 4

2 Answers2

3

Basically there are 2 ways to access a tuple parameter in anonymous function. They're functionally equivalent, use whatever method you prefer.

  1. Through the attributes _1, _2,...
  2. Through pattern matching into variable with meaningful name

    val tuples = Array((1, 2), (2, 3), (3, 4))
    
    // Attributes
    tuples.foreach { t => 
      println(s"${t._1} ${t._2}")
    }
    
    // Pattern matching
    tuples.foreach { t =>
      t match {
        case (first, second) =>
          println(s"$first $second")
      }
    }
    
    // Pattern matching can also written as
    tuples.foreach { case (first, second) =>
        println(s"$first $second")
    }
    
Kien Truong
  • 11,179
  • 2
  • 30
  • 36
3

The notation (x, y) is a tuple of 2 elements, x and y. There are different ways to get access to the individual values in a tuple. You can use the ._1, ._2 notation to get at the elements:

val tup = (3, "Hello")    // A tuple with two elements

val number = tup._1       // Gets the first element (3) from the tuple
val text = tup._2         // Gets the second element ("Hello") from the tuple

You can also use pattern matching. One way to extract the two values is like this:

val (number, text) = tup

Unlike a collection (for example, a List) a tuple has a fixed number of values (it's not always exactly two values) and the values can have different types (such as an Int and a String in the example above).

There are many tutorials about Scala tuples, for example: Scala tuple examples and syntax.

Jesper
  • 202,709
  • 46
  • 318
  • 350