0

I'm trying to expend my RDD table by one column (with string values) using this question answers but I cannot add a column name this way... I'm using Scala.

Is there any easy way to add a column to RDD?

Community
  • 1
  • 1

1 Answers1

1

Apache Spark has a functional approach to the elaboration of data. Fundamentally, an RDD[T] is some sort of collection of objects (RDD stands for Resilient Distributed Data structure).

Following the functional approach, you elaborate the objects inside the RDD using transformations. Transformations construct a new RDD from a previous one.

One example of transformation is the map method. Using map, you can transform each object of your RDD in every other type of object you need. So, if you have a data structure that represents a row, you can trasform that structure in a new one with an added row.

For example, take the following piece of code.

val rdd: (String, String) = sc.pallelize(List(("Hello", "World"), ("Such", "Wow"))
// This new RDD will have one more "column",  
// which is the concatenation of the previous
val rddWithOneMoreColumn = 
  rdd.map {
    case(a, b) => 
      (a, b, a + b)

In this example an RDD of Tuple2 (a.k.a. a couple) is transformed into an RDD of Tuple3, simply applying a function to each RDD element.

Clearly, you have to apply an action over the object rddWithOneMoreColumn to make the computation happen. In fact, Apache Spark computes lazily the result of all of your transformation.

riccardo.cardin
  • 7,971
  • 5
  • 57
  • 106