Trying to create a table and load data into same table using Databricks and SQL

Question

I Googled for a solution to create a table, using Databticks and Azure SQL Server, and load data into this same table. I found some sample code online, which seems pretty straightforward, but apparently there is an issue somewhere. Here is my code.

CREATE TABLE MyTable
USING org.apache.spark.sql.jdbc 
OPTIONS (
  url "jdbc:sqlserver://server_name_here.database.windows.net:1433;database = db_name_here",
  user "u_name",
  password "p_wd",
  dbtable "MyTable"
);

Now, here is my error.

Error in SQL statement: SQLServerException: Invalid object name 'MyTable'.

My password, unfortunately, has spaces in it. That could be the problem, perhaps, but I don't think so.

Basically, I would like to get this to recursively loop through files in a folder and sub-folders, and load data from files with a string pattern, like 'ABC*', and load recursively all these files into a table. The blocker, here, is that I need the file name loaded into a field as well. So, I want to load data from MANY files, into 4 fields of actual data, and 1 field that captures the file name. The only way I can distinguish the different data sets is with the file name. Is this possible? Or, is this an exercise in futility?

score 1 · Accepted Answer · answered Oct 08 '19 at 14:06

my suggestion is to use the Azure SQL Spark library, as also mentioned in documentation:

https://docs.databricks.com/spark/latest/data-sources/sql-databases-azure.html#connect-to-spark-using-this-library

The 'Bulk Copy' is what you want to use to have good performances. Just load your file into a DataFrame and bulk copy it to Azure SQL

https://docs.databricks.com/data/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server

To read files from subfolders, answer is here:

How to import multiple csv files in a single load?

I started with Spark and switched to SQL, only because I couldn't get Spark to do what I needed. I'll take a second look at that option. Thanks!! — ASH, Oct 08 '19 at 16:26

score 0 · Answer 2 · answered Oct 09 '19 at 13:24

0

I finally, finally, finally got this working.

val myDFCsv = spark.read.format("csv")
   .option("sep","|")
   .option("inferSchema","true")
   .option("header","false")
   .load("mnt/rawdata/2019/01/01/client/ABC*.gz")

myDFCsv.show()
myDFCsv.count()

Thanks for a point in the right direction mauridb!!

answered Oct 09 '19 at 13:24

ASH

20,759
19
87
200

My pleasure :) :) :) – mauridb Oct 09 '19 at 14:49

Trying to create a table and load data into same table using Databricks and SQL

2 Answers2