0

We are trying to load data from an S3 bucket into Snowflake using COPY INTO. Works perfectly.. But data in subfolders are also being copied, and this shoud not not happen.

Following hardcoded pattern REGEX works perfectly

copy into TARGETTABLE
from @SOURCESTAGE
pattern='^(?!.*subfolder/).*$'

But we don't want to hardcode the foldername. When I just keep the '/' it doesn't work anymore.. ( same happens when I escape the slash \/ )

copy into TARGETTABLE
from @SOURCESTAGE
pattern='^(?!.*/).*$'

Does anybody knows which REGEX to use to skip any subfolder in the COP INTO in a dynamic way? (no hardcoding of folder name )

@test_stage/folder_include
@test_stage/folder_include/file_that_has_to_be_loaded.csv
@test_stage/folder_include/folder_exclude/file_that_cannot_be_loaded.csv

So only files within folder_include can be picked up by the copy into statement. Everything in a lower level needs to be skipped. Most importantly: without hardcoding on foldername. Any folder within folder_include has to be ignored.

Thanks!

  • I think this may have already been answered here: [How to avoid sub folders in snowflake copy statement](https://stackoverflow.com/questions/59417105/how-to-avoid-sub-folders-in-snowflake-copy-statement) – NickW Oct 01 '20 at 11:39
  • Hi Nick, thanks for answering. In fact it is not. This solution provides a hardcodes fix. Not a dynamic one. The solution shows how to skip subfolders by name. I need a more generic solution. – Benoit Turbang Oct 01 '20 at 11:41
  • Does the pattern you want to keep contain just a single slash character (i.e. subfolders contain 2 or more slashes)? If so this might help (obviously changing % to /): https://stackoverflow.com/questions/16762492/regex-to-match-only-if-symbol-in-string-used-once – NickW Oct 01 '20 at 13:13
  • Any folder within the source stage will contain at least 1 '/' . I tried it but it doesnt help. Thanks. – Benoit Turbang Oct 01 '20 at 14:25
  • If the top-level folder (the one you want to load from) contains 1 '/' then don't the subfolders have to contain more than 1 '/'? If that is the case then why does filtering on a single '/' not work? – NickW Oct 01 '20 at 15:01
  • i really don't know... it is kind of a mystery to me. Try it out.. you will be mind blown ;) – Benoit Turbang Oct 02 '20 at 07:52
  • @BenoitTurbang did you try to keep access on only required folder and revoke access on subfolders by using AWS IAM policy. – PIG Oct 02 '20 at 16:36
  • Can you give an actual example input? It's hard to know if my solution would work if I can't test it – Felipe Hoffa Oct 02 '20 at 20:40
  • @FelipeHoffa, any input can be used. As soon as you have folders containing files on same level of the files you want to process they will also be picked up. – Benoit Turbang Oct 04 '20 at 08:05

2 Answers2

0

Here (like mentioned in the comments) is a solution for skipping a hardcoded foldername: How to avoid sub folders in snowflake copy statement

In my opinion replacing the hardcoded-part with .* makes it generic.

Kind regards :)

Marcel
  • 2,454
  • 1
  • 5
  • 13
  • Hi Marcel, this will not work. I think the PATTERN regex will always check the entire PATH. Including everything that is defined in ur STAGE in backend. Replacing it with '.*' will also test positive on all other slashes in the path .. the result will be that nothing passes. – Benoit Turbang Oct 01 '20 at 12:52
0

If the PATH that's included in STAGE is static, you can include that in your pattern.

list @SOURCESTAGE PATTERN = 'full_path_to_folder_include/[^/]*'

Even if your path include environment specific folder (for eg. DEV, PROD), you can account for that:

list @SOURCESTAGE PATTERN = 'static_path/[^/]+/path_to_folder/[^/]*'
or
list @SOURCESTAGE PATTERN = 'static_path/(dev|test|prod)/path_to_folder/[^/]*'
karunP
  • 31
  • 4