0

At the moment, I am making a dataframe from a tab separated file with a header, like this.

val df = sqlContext.read.format("csv")
  .option("header", "true")
  .option("delimiter", "\t")
  .option("inferSchema","true").load(pathToFile)

I want to do exactly the same thing but with a String instead of a file. How can I do that?

MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
  • I have found the answer from here, https://stackoverflow.com/questions/39111918/can-i-read-a-csv-represented-as-a-string-into-apache-spark-using-spark-csv – MetallicPriest Mar 06 '19 at 17:29

1 Answers1

0

To the best of my knowledge, there is no built in way to build a dataframe from a string. Yet, for prototyping purposes, you can create a dataframe from a Seq of Tuples.

You could use that to your advantage to create a dataframe from a string.

scala> val s  ="x,y,z\n1,2,3\n4,5,6\n7,8,9"
s: String =
    x,y,z
    1,2,3
    4,5,6
    7,8,9
scala> val data = s.split('\n')

// Then we extract the first element to use it as a header.
scala> val header = data.head.split(',')
scala> val df = data.tail.toSeq
    // converting the seq of strings to a DF with only one column
    .toDF("X")
    // spliting the string
    .select(split('X, ",") as "X") 
    // extracting each column from the array and renaming them
    .select( header.indices.map( i => 'X.getItem(i).as(header(i))) : _*)

scala> df.show
+---+---+---+
|  x|  y|  z|
+---+---+---+
|  1|  2|  3|
|  4|  5|  6|
|  7|  8|  9|
+---+---+---+

ps: if you are not in the spark REPL make sure to write this import spark.implicits._ so as to use toDF().

Oli
  • 9,766
  • 5
  • 25
  • 46
  • Check https://stackoverflow.com/questions/39111918/can-i-read-a-csv-represented-as-a-string-into-apache-spark-using-spark-csv – MetallicPriest Mar 06 '19 at 17:30