0

I work with Microsoft Databrics and there is a simple function to save a table with a pyspark dataframe

table_name = 'location.table_name'
df.write.saveAsTable(table_name)

However this does not works with a pandas dataframe, and making a conversion is problematic.

What I need is a function that, given only 2 arguments, dataframe and tablename, makes the same function

Should look like this:

def save_pandas_to_SQL(df, 'location.table_name'):
    """Function"""
  • You can convert pandas dataframe to spark dataframe first and then save it. Can refer to: https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/spark-pandas – 过过招 Feb 15 '22 at 03:21
  • It does not always give the expected result, often has issues with the schema – Random Person Feb 15 '22 at 03:29

1 Answers1

0
import pandas as pd    
data = [['Scott', 50], ['Jeff', 45], ['Thomas', 54],['Ann',34]] 
 
# Create the pandas DataFrame 
pandasDF = pd.DataFrame(data, columns = ['Name', 'Age']) 

First, transform your pandas Dataframe to a spark-Dataframe, then save it as a table.

sparkDF = spark.createDataFrame(pandasDF) 
sparkDF.printSchema()
sparkDF.show()

table_name = 'location.table_name'
sparkDF.write.saveAsTable(table_name)

root
 |-- Name: string (nullable = true)
 |-- Age: long (nullable = true)

+------+---+
|  Name|Age|
+------+---+
| Scott| 50|
|  Jeff| 45|
|Thomas| 54|
|   Ann| 34|
+------+---+
JAdel
  • 1,309
  • 1
  • 7
  • 24
  • As mentioned above, the problem is with the conversion, I cannot convert it first it needs to be done with a pandas dataframe. – Random Person Feb 17 '22 at 19:10
  • https://stackoverflow.com/questions/47393001/how-to-save-a-huge-pandas-dataframe-to-hdfs maybe this can help. Databricks has his own filestore called DBFS, but it works like HDFS so I think this can work. – JAdel Feb 18 '22 at 09:09