2

I'm trying to save csv file as a result of SQL query, sent to Athena via Databricks. The file is supposed to be a big table of about 4-6 GB (~40m rows).

I'm doing the next steps:

  1. Creating PySpark dataframe by:

    df = sqlContext.sql("select * from my_table where year = 19")
    
  2. Converting PySpark dataframe to Pandas dataframe. I realize, this step may be unnecessary, but I only start using Databricks and may not know the required commands to do it more swiftly. So I do it like this:

    ab = df.toPandas()
    
  3. Save the file somewhere to download it locally later:

    ab.to_csv('my_my.csv')
    

But how do I download it?

I kindly ask you to be very specific as I do not know many tricks and details in working with Databricks.

James Z
  • 12,209
  • 10
  • 24
  • 44

0 Answers0