I am setting up a new GCP project to read and Parse a CSV file as soon as it is uploaded to a bucket. To that extent, I have created a trigger that publishes to a pub/sub. Pub/Sub itself sends messages to a background function.
Everything seems to be working fine e.g. as soon as a file is uploaded trigger comes to live to send a message to Pubsub and subsequently to the function. I can also see the message coming through to the function.
The problem, however, is sending an Ack back to the pub/sub. Somewhere I read sending back any 2xx status should do the job (to delete the message from the queue), but it does not. As a result pubsub "thinks" the message has not been delivered and sends the message over and over again.
def parse_data(data, context):
if 'data' in data:
args = base64.b64decode(data['data']).decode('utf-8')
pubsub_message = args.replace('\n', ' ')
properties = json.loads(pubsub_message)
myBucket = validate_message(properties, 'bucket')
myFileName = validate_message(properties, "name")
fileLocation = 'gs://'+myBucket+'/'+myFileName
readAndEnhanceData(fileLocation)
return 'OK', 200
else:
return 'Something went wrong, no data received'
and here is the log file that shows the function is being called continuously.
D CSV_Parser_Raw_Data 518626734652287 Function execution took 72855 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518626734652287
D CSV_Parser_Raw_Data 518626708442766 Function execution took 131886 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518626708442766
D CSV_Parser_Raw_Data 518624470100006 Function execution took 65412 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518624470100006
D CSV_Parser_Raw_Data 518626734629237 Function execution took 68004 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518626734629237
D CSV_Parser_Raw_Data 518623777839079 Function execution took 131255 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518623777839079
D CSV_Parser_Raw_Data 518623548622842 Function execution took 131186 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518623548622842
D CSV_Parser_Raw_Data 518623769252453 Function execution took 133981 ms,
finished with status: 'ok' CSV_Parser_Raw_Data 518623769252453
So I would be grateful to know what I am missing here! I.e. How can I break this loop?
* UPDATE on the Issue * Thanks to @kamal who forced me to open my eyes, tasked myself to recreate buckets/topics etc. whilst I was on the task, re-reviewed everything and realised, I used a temporary file in a sub-folder but in the SAME bucket as upload files! That was the issue. The Finalize event is for ANY object created ANYWHERE in the bucket. So Kamal was right multiple uploads were taking place!
if you are tackling your project in the same way, make sure to create a tmp folder and make sure you do not add ANY trigger to that folder.