0

I am trying to read a set of XML files nested in many folders into sequence files in spark. I can read the file names using function recursiveListFiles from How do I list all files in a subdirectory in scala?.

import java.io.File
def recursiveListFiles(f: File): Array[File] = {
 val these = f.listFiles
 these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}

But how to read the file content as separate column here?

Nicktar
  • 5,548
  • 1
  • 28
  • 43
VSe
  • 919
  • 2
  • 13
  • 29

1 Answers1

0

What about using sparks wholeTextFiles method? And parsing the XML yourself afterwards?

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
  • I tried the wholeTextFiles method but I cant use .xml that is only to select the xml files in the folders. something like `sc.wholeTextFiles("mainpath/*.xml")` – VSe Dec 06 '19 at 08:59