Azure Synapse Analytics Dedicated SQL Pool: How can I create an external table as select ( CETAS ) and partition by CustomerID?

Question

I can create CETAS tables in Azure Blob storage fairly easily ( link ), however, is there a simple way to automatically partition by CustomerID so that the rows associated with each customer will end up in their own folder?

Essentially I'm looking for a CETAS parameter that will cause my data to end up in folders like this:

Azure Blob Storage Container/
    CustomerTable/
        CustomerID=1/
            .parquet files for Customer 1
        CustomerID=2/
            .parquet files for Customer 2
        ...
        CustomerID=N/
            .parquet files for Customer N

(Note: I'm running around 10,000 CETAS statements on 100+ TB of data, I imagine I can change my script to look for tables that have a CustomerID column and manually partition the data by CustomerID, however, I was hoping there was some syntax with the CETAS statement that I could have the system partition the data for me).

I ended up doing this manually. – Colorado Techie Jun 29 '23 at 22:00 — Colorado Techie, Jun 29 '23 at 22:00

Subash · Answer 1 · 2023-04-15T09:56:13.233

0

Not with CETAS.But you can try using Delta lake

CREATE TABLE CustomerTable
USING delta
PARTITIONED BY (CustomerID)
LOCATION 'abfss://<></>'
AS SELECT /*/ FROM <Externam TAble already created>

And then create a view of it

CREATE VIEW Customer.vwCustomerData
AS
SELECT *
FROM 
OPENROWSET(
        BULK '<></>',
        DATA_SOURCE = 'ExternalDataSourceDataLake',
        FORMAT='DELTA'
    ) dim

Note: External Table to be used

edited Apr 15 '23 at 09:56

answered Apr 15 '23 at 08:43

Subash

887
1
8
19

Do you have any links to the documentation showing CTAS with a USING or PARTITIONED BY clauses? I looked at https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?view=aps-pdw-2016-au7 and https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse?view=aps-pdw-2016-au7 but I don't see the documentation on how to use it? (When I tried what you suggested I get Parse error at line: 2, column: 1: Incorrect syntax near 'USING'. – Colorado Techie Apr 26 '23 at 17:05

Azure Synapse Analytics Dedicated SQL Pool: How can I create an external table as select ( CETAS ) and partition by CustomerID?

1 Answers1