-2

I am a beginner to Scala and I want to loop through each line which I am reading as below:

val data = sc.textFile("D:/Data.csv")

Data.csv is like below:

1,462,0,NA,0,1,0,Friday,1,5
1,147,33,NA,0,1,0,Friday,1,5
1,105,58,NA,0,1,0,Friday,1,5
1,276,96,NA,0,1,0,Friday,1,5
1,466,1,NA,0,1,0,Friday,1,5
1,466,1,NA,0,1,0,Friday,1,5
1,466,1,NA,0,1,0,Friday,1,5

I want to iterate through each line in the above csv and print the 1st and 3 rd column values in each row.Any help is appreciated.

Ricky
  • 2,662
  • 5
  • 25
  • 57
  • Be aware tha working with Spark and RDD's is very different from regular Scala code. Your code runs remotely and you can't use loops as normal. – puhlen Jul 11 '17 at 13:47
  • Possible duplicate of [How do I iterate RDD's in apache spark (scala)](https://stackoverflow.com/questions/25914789/how-do-i-iterate-rdds-in-apache-spark-scala) – stefanobaghino Jul 11 '17 at 13:47

1 Answers1

4
val data = sc.textFile("D:/Data.csv")

data.map(_.split(','))
    .foreach(r => println(r(0), r(2)))

The map call above splits each line in the file on a comma, turning each line into an Array[String] and creating a RDD[Array[String]]: each element in this RDD is an Array[String] of the column values in a line.

The foreach call prints the first and third column values of each line (i.e., the first and third elements in each Array[String] in the RDD):

(1,0)
(1,1)
(1,1)
(1,33)
(1,1)
(1,58)
(1,96)
Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54