0

I have a deployed DAG in which I'm using check_for_wildcard_key() to check if files for a particular day are present in an s3 location and then decide which branch to return. following is the code :

from airflow.hooks import S3_hook
def checkforfiles(**kwargs):
    hook = S3_hook.S3Hook('s3_dummy')
    if hook.check_for_wildcard_key(f"s3://bucket/fixed/path/that/stays/the/same/daily/{sub_path_that_changes_daily}/*"):
        return 'branch1
    else:
        return 'end'

The problem is that even though the files are present at their locations, 'end' is returned all the time. I want to test what's happening here in my local system since there is virtually no useful logging in airflow. I have the acees_key and secret_key for the bucket, how do I pass them to this s3_hook? As of now I get the following error :

AirflowNotFoundException: The conn_id `s3_dummy` isn't defined

I tried using mock but couldn't figure out a way to get it to work. Any help would be appreciated. *Edit : Although I found the bug in my code without the need of a mock connection, but let's keep this thread open to help others in need.

1 Answers1

0

You are using deprecated import. You need to import the hook from Amazon provider as:

from airflow.providers.amazon.aws.hooks.s3 import S3Hook

The error means that you didn't define conn_id. You need to follow this guide and setup the connection. You can also check creating boto3 s3 client on Airflow with an s3 connection and s3 hook for refrence.

If you are looking to mock a connection you can for example do:

conn = Connection(
    conn_type="gcpssh",
    login="cat",
    host="conn-host",
)
conn_uri = conn.get_uri()
with mock.patch.dict("os.environ", AIRFLOW_CONN_MY_CONN=conn_uri):
    assert "cat" == Connection.get("my_conn").login

This is explained in the Mocking variables and connections section of Airflow docs.

You can see examples how to use check_for_wildcard_key in the test file.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • Thnakyou for the answer Elad, but I already went through all of these resources before coming here since none of these helped my case. As I mentioned in the question, I only have acees_key and secret_key to the bucket and do not have login or host values. passing them in "Extras" doesn't help either. – TwerkingPanda May 13 '22 at 13:38
  • The guide says to place the access_key in the host and secret_key in the password. If that didn't work please clarify what is the issue – Elad Kalif May 13 '22 at 13:55