0

Suppose i am having 2 files in file1,file2 in dataset directory:

val file = sc.wholeTextFiles("file:///root/data/dataset").map((x,y) => y + "," + x)

in the Above code i am trying to get an rdd having values:-> value,key as single value into rdd

suppose filename is file1 and say 2 records:

file1:

1,30,ssr

2,43,svr

And

file2:

1,30,psr

2,43,pvr

The desired rdd output is:

(1,30,ssr,file1),(2,43,svr,file1),(1,30,psr,file2),(2,43,pvr,file2)

Can we achieve this? if possible Please Help me!

Community
  • 1
  • 1

1 Answers1

0
var files = sc.wholeTextFiles("file:///root/data/dataset")

var yourNeededRdd = files
  .flatMap({
    case (filePath, fileContent) => {
      val fileName = filePath.split('/).last
      fileContent.split("\n").map(line => line + "," + fileName)
    }
  })
sarveshseri
  • 13,738
  • 28
  • 47
  • thanks @Sarvesh Kumar Singh.it is working but i am getting : ,filename is added at the start of records – Srinathji Kyadari Aug 11 '16 at 15:36
  • thanks @Sarvesh Kumar Singh.. its working one thing i just swapped fileName and line in map function...that is giving correct result – Srinathji Kyadari Aug 11 '16 at 16:02
  • 1
    just to state the obvious (just as a warning for other people reading this answer) - for this solution, any file in the folder can't be more than local RAM size. That's what "whole" basically suggests. – dk14 Sep 20 '17 at 07:05