0

I have been using Google Cloud Functions (GCF) to setup a serverless environment. This works fine and it covers most of the required functionality that I need.

However, for one specific module, extracting data from FTP servers, the duration of parsing the files from a provider takes longer than 540s. For this reason, the task that I execute gets timed out when deploying it as a cloud function.

In addition, some FTP servers require that they whitelist an ip address that is making these requests. When using Cloud functions, unless you reserve somehow a static address or a range, this is not possible.

I am therefore looking for an alternative solution to execute a Python script in the cloud on the Google platform. The requirements are:

  • It needs to support Python 3.7
  • It has to have the possibility to associate a static IP address to it
  • One execution should be able to take longer than 540s
  • Ideally, it should be possible to easily deploy the script (as it is the case with GCF)

What is the best option out there for these kind of needs?

WJA
  • 6,676
  • 16
  • 85
  • 152

2 Answers2

2

The notion of a Cloud Function is primarily that of a Microservice ... something that runs for a relatively short period of time. In your story, we seem to have actions that can run for an extended period of time. This would seem to lend itself to the notion of running some form of compute engine. The two that immediately come to mind are Google Compute Engine (CE) and Google Kubernetes Engine (GKE). Let us think about the Compute Engine. Think of this as a Linux VM where you have 100% control over it. This needn't be a heavyweight thing ... Google provides micro compute engines which are pretty darn tiny. You can have one or more of these including the ability to dynamically scale out the number of instances if load on the set becomes too high. On your compute engine, you can create any environment you wish ... including installing a Python environment and running Flask (or other) to process incoming requests. You can associate your compute engine with a static IP address or associate a static IP address with a load balancer front-ending your engines.

Kolban
  • 13,794
  • 3
  • 38
  • 60
  • How does App Engine fit in this story? – WJA Jul 25 '19 at 15:33
  • App Engine is another compute story and similar to Cloud Functions, spins up instances on demand. App Engine is solid and proven technology but I have a loose thought that it is has now been supplanted by newer technologies. It is also Google proprietary. – Kolban Jul 25 '19 at 15:52
1

Here is how I download files from FTP with Google Cloud Functions to Google Cloud Storage. It takes less than 30 secs (depending on the file size).

#import libraries
from google.cloud import storage
import wget


def importFile(request):

 #set storage client
 client = storage.Client()

 # get bucket
 bucket = client.get_bucket('BUCKET-NAME') #without gs://
 blob = bucket.blob('file-name.csv')

 #See if file already exists
 if blob.exists() == False:

    #copy file to google storage
    try:
        link = 'ftp://account:password@ftp.domain.com/folder/file.csv' #for non-public ftp files
        ftpfile = wget.download(link, out='/tmp/destination-file-name.csv') #save downloaded file in /tmp folder of Cloud Functions
        blob.upload_from_filename(ftpfile)
        print('Copied file to Google Storage!')

    #print error if file doesn't exists
    except BaseException as error:
        print('An exception occurred: {}'.format(error))

 #print error if file already exists in Google Storage
 else:
    print('File already exists in Google Storage') 
Gidi9
  • 204
  • 2
  • 7