0

I am new to Spark Scala and I am trying to make a SQL query on a csv file and return the records. Below is what I have, but is not working:

val file = sc.textFile(“file:///data/home_data.csv”)
val records = file.sqlContext("SELECT id FROM home_data WHERE yr_built < 1979")
combined.collect().foreach(records)

I get errors with the file.sqlContext function.

Thanks

Satish
  • 21
  • 4
  • Have you checked this https://stackoverflow.com/questions/43508054/spark-sql-how-to-read-a-tsv-or-csv-file-into-dataframe-and-apply-a-custom-sche – sawan Oct 30 '17 at 04:44
  • will you provide the error in question – sawan Oct 30 '17 at 04:45
  • Check the answer of this question https://stackoverflow.com/questions/29704333/spark-load-csv-file-as-dataframe – sawan Oct 30 '17 at 04:47

1 Answers1

0

Can you use class to map the data with the respective field names and datatypes, then use your query:

case class Person(first_name:String,last_name: String,age:Int)
val pmap = p.map ( line => line.split (","))
val personRDD = pmap.map ( p => Person (p(0), p(1), p(2). toInt))
val personDF = personRDD. toDF

then query the persondf.

I dont know the schema, so i formulated this way.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
loneStar
  • 3,780
  • 23
  • 40
  • Sorry, I did not understand your response. Basically, I am just looking to query on a CSV with SQL and show the results. Can you help please? Trying to do something like this: val file = sc.textFile(“file:///data/home_data.csv”) val records = file.sql("SELECT id FROM data_table WHERE size < 100") records.show – Satish Oct 30 '17 at 13:16