1

I can create CETAS tables in Azure Blob storage fairly easily ( link ), however, is there a simple way to automatically partition by CustomerID so that the rows associated with each customer will end up in their own folder?

Essentially I'm looking for a CETAS parameter that will cause my data to end up in folders like this:

Azure Blob Storage Container/
    CustomerTable/
        CustomerID=1/
            .parquet files for Customer 1
        CustomerID=2/
            .parquet files for Customer 2
        ...
        CustomerID=N/
            .parquet files for Customer N

(Note: I'm running around 10,000 CETAS statements on 100+ TB of data, I imagine I can change my script to look for tables that have a CustomerID column and manually partition the data by CustomerID, however, I was hoping there was some syntax with the CETAS statement that I could have the system partition the data for me).

Colorado Techie
  • 1,302
  • 1
  • 13
  • 21

1 Answers1

0

Not with CETAS.But you can try using Delta lake

CREATE TABLE CustomerTable
USING delta
PARTITIONED BY (CustomerID)
LOCATION 'abfss://<></>'
AS SELECT /*/ FROM <Externam TAble already created>

And then create a view of it

CREATE VIEW Customer.vwCustomerData
AS
SELECT *
FROM 
OPENROWSET(
        BULK '<></>',
        DATA_SOURCE = 'ExternalDataSourceDataLake',
        FORMAT='DELTA'
    ) dim

Note: External Table to be used

Subash
  • 887
  • 1
  • 8
  • 19
  • Do you have any links to the documentation showing CTAS with a USING or PARTITIONED BY clauses? I looked at https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?view=aps-pdw-2016-au7 and https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse?view=aps-pdw-2016-au7 but I don't see the documentation on how to use it? (When I tried what you suggested I get Parse error at line: 2, column: 1: Incorrect syntax near 'USING'. – Colorado Techie Apr 26 '23 at 17:05