0

I have a requirement to exclude certain folder from prefix and process the data in snowflake (Copy statement)

In the below example I need to process files under emp/ and exclude files from abc/

Input :

s3://bucket1/emp/

E1.CSV
E2.CSV
/abc/E11.csv

s3://bucket1/emp/abc/ - E11.csv

Output :

s3://bucket1/emp/

E1.CSV
E2.CSV

Is there any suggestion around pattern to handle this ?

2 Answers2

2

With the pattern keyword you can try to exclude certain files. However when using the pattern matching with the NOT syntax, you exclude any file with any of the characters.

Assuming your stage URL is defined as s3://bucket1/emp/

LS @MY_STAGE pattern = '[^abc].*'; 
  • Excludes anything starting with a, b, or c
LS @MY_STAGE pattern = '[^a][^b][^c][^\\/].*';  
  • Excludes anything where:
    • The first character is a, OR
    • The second character is b, OR
    • The third character is c, OR
    • The fourth character is a forward slash /

Edit

After testing with Sharvan's example. Here is what I've found:

Doesn't work: ls @my_stage PATTERN='^((?!/abc/).)*$'; because the first forward slash is duplicated as part of the stage URL (it is automatically appended to the stage URL if not present)

Works: ls @my_stage PATTERN='^((?!abc/).)*$'; because the first forward slash is removed

Updated as the forward slash does not need to be escaped

Snowflake does not support backreferences (per their documentation) but there is no mention of lookaheads or lookbehinds, which I thought was un-supported.

https://docs.snowflake.net/manuals/sql-reference/functions-regexp.html#backreferences

Chris
  • 680
  • 3
  • 6
1

Use this to exclude the prefix pattern

ls @stage PATTERN='^((?!/abc/).)*$'
Sharvan Kumar
  • 179
  • 1
  • 2
  • 9