0

I Googled for a solution to create a table, using Databticks and Azure SQL Server, and load data into this same table. I found some sample code online, which seems pretty straightforward, but apparently there is an issue somewhere. Here is my code.

CREATE TABLE MyTable
USING org.apache.spark.sql.jdbc 
OPTIONS (
  url "jdbc:sqlserver://server_name_here.database.windows.net:1433;database = db_name_here",
  user "u_name",
  password "p_wd",
  dbtable "MyTable"
);

Now, here is my error.

Error in SQL statement: SQLServerException: Invalid object name 'MyTable'.

My password, unfortunately, has spaces in it. That could be the problem, perhaps, but I don't think so.

Basically, I would like to get this to recursively loop through files in a folder and sub-folders, and load data from files with a string pattern, like 'ABC*', and load recursively all these files into a table. The blocker, here, is that I need the file name loaded into a field as well. So, I want to load data from MANY files, into 4 fields of actual data, and 1 field that captures the file name. The only way I can distinguish the different data sets is with the file name. Is this possible? Or, is this an exercise in futility?

Dale K
  • 25,246
  • 15
  • 42
  • 71
ASH
  • 20,759
  • 19
  • 87
  • 200

2 Answers2

1

my suggestion is to use the Azure SQL Spark library, as also mentioned in documentation:

https://docs.databricks.com/spark/latest/data-sources/sql-databases-azure.html#connect-to-spark-using-this-library

The 'Bulk Copy' is what you want to use to have good performances. Just load your file into a DataFrame and bulk copy it to Azure SQL

https://docs.databricks.com/data/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server

To read files from subfolders, answer is here:

How to import multiple csv files in a single load?

mauridb
  • 1,467
  • 9
  • 12
  • 1
    I started with Spark and switched to SQL, only because I couldn't get Spark to do what I needed. I'll take a second look at that option. Thanks!! – ASH Oct 08 '19 at 16:26
0

I finally, finally, finally got this working.

val myDFCsv = spark.read.format("csv")
   .option("sep","|")
   .option("inferSchema","true")
   .option("header","false")
   .load("mnt/rawdata/2019/01/01/client/ABC*.gz")

myDFCsv.show()
myDFCsv.count()

Thanks for a point in the right direction mauridb!!

ASH
  • 20,759
  • 19
  • 87
  • 200