Some files get uploaded on a daily basis to an FTP server and I need those files under Google Cloud Storage. I don't want to bug the users that upload the files to install any additional software and just let them keep using their FTP client. Is there a way to use GCS as an FTP server? If not, how can I create a job that periodically picks up the files from an FTP location and puts them in GCS? In other words: what's the best and simplest way to do it?

- 2,642
- 7
- 40
- 62
-
2Seems like one way is to set up an FTP server on a VM and use gcsfs to connect this server to GCS, as described here http://ilyapimenov.com/blog/2015/01/19/ftp-proxy-to-gcs.html - does this work for you? – jkff Apr 19 '17 at 04:59
4 Answers
You could write yourself an FTP server which uploads to GCS, for example based on pyftpdlib
Define a custom handler which stores to GCS when a file is received
import os
from pyftpdlib.handlers import FTPHandler
from pyftpdlib.servers import FTPServer
from pyftpdlib.authorizers import DummyAuthorizer
from google.cloud import storage
class MyHandler:
def on_file_received(self, file):
storage_client = storage.Client()
bucket = storage_client.get_bucket('your_gcs_bucket')
blob = bucket.blob(file[5:]) # strip leading /tmp/
blob.upload_from_filename(file)
os.remove(file)
def on_... # implement other events
def main():
authorizer = DummyAuthorizer()
authorizer.add_user('user', 'password', homedir='/tmp', perm='elradfmw')
handler = MyHandler
handler.authorizer = authorizer
handler.masquerade_address = add.your.public.ip
handler.passive_ports = range(60000, 60999)
server = FTPServer(("127.0.0.1", 21), handler)
server.serve_forever()
if __name__ == "__main__":
main()
I've successfully run this on Google Container Engine (it requires some effort getting passive FTP working properly) but it should be pretty simple to do on Compute Engine. According to the above configuration, open port 21 and ports 60000 - 60999 on the firewall.
To run it, python my_ftp_server.py
- if you want to listen on port 21 you'll need root privileges.

- 590
- 5
- 14
-
Where should this file go? how do user authenticate with the ftp client (what are the host name, user, pass)? – CCC Apr 19 '17 at 16:12
-
I think crazystick is suggesting that the user authenticates with the FTP server however you like, and the FTP server, which you're running, has credentials to upload the objects to GCS. So you write to FTP server, FTP server forwards that upload stream on to GCS. – Brandon Yarbrough Apr 19 '17 at 17:12
-
Yes - look at the docs for pyftpdlib and you will find a number of options for authentication. In the example above, everyone would connect to the FTP server using username "user" and password "password", and all files get dumped in the same GCS bucket with default security. Running on Compute Engine / Container Engine gets you credentials for GCS – crazystick Apr 20 '17 at 10:58
-
Thank you... I know I might be asking for too much, but where should this file go and what config changes need to be done in the VM in Compute Engine? Also, should there be any consideration for passive FTP and connecting to the external IP? – CCC Apr 21 '17 at 15:43
-
I added a couple of extra config options you'll probably want to run it on GCE. To have it start automatically you would have to write a systemd service for it. That should be pretty trivial and there are plenty of resources explaining how. – crazystick Apr 21 '17 at 19:55
-
Make sure you use a recent version of pyftpdlib! You want a version that includes [my patch](https://github.com/giampaolo/pyftpdlib/issues/414) to gracefully handle I/O errors. The [GCS SLA](https://cloud.google.com/storage/sla) allows for failed writes, and they do happen from time to time (about once every other month for us, we write about 5 files a minute). The patch will be in version 1.5.3 which is not released yet. – Emil Vikström Jun 26 '17 at 08:49
-
I started integrating pyftpdlib and pyfilesystem2, only to find out that pyftpdlib is very tightly coupled to platform dependent calls such as `os.path.join` and `os.path.realpath`. They have an AbstractedFS class but they don't use it everywhere. Did anyone have success using pyftpdlib as a proxy to any storage backend? – Tamas Hegedus Jul 17 '20 at 18:35
-
I have gotten it to work on 127.0.0.1. However, I'm not able to try it externally using the masquerade_address – GILO Sep 06 '21 at 20:06
You could setup a cron and rsync between the FTP server and Google Cloud Storage using gsutil rsync or open source rclone tool.
If you can't run those commands on the FTP server periodically, you could mount the FTP server as a local filesystem or drive (Linux, Windows)

- 1,641
- 12
- 14
-
-
You would need to mount the bucket as a filesystem somewhere for example by using gcs-fuse https://cloud.google.com/storage/docs/gcs-fuse – Lukasz Cwik Oct 11 '19 at 18:08
I have successfully set up an FTP proxy to GCS using gcsfs in a VM in Google Compute (mentioned by jkff in the comment to my question), with these instructions: http://ilyapimenov.com/blog/2015/01/19/ftp-proxy-to-gcs.html
Some changes are needed though:
- In /etc/vsftpd.conf change #write_enable=YES
to write_enable=YES - Add firewall rules in your GC project to allow access to ports 21 and passive ports 15393 to 15592 (https://console.cloud.google.com/networking/firewalls/list)
Some possible problems:
- If you can access the FTP server using the local ip, but not the remote ip, it's probably because you haven't set up the firewall rules
- If you can access the ftp server, but are unable to write, it's probably because you need the write_enable=YES
- If you are tying to read on the folder you created on /mnt, but get a I/O error, it's probably because the bucket in gcsfs_config is not right.
Also, your ftp client needs to use the transfer mode set to "passive".
-
We did this but hade huge amounts of intermittent errors with all ready-made FTP solutions. The only thing that worked out in the end was pyftpdlib, which we do run on a FUSE-mounted GCS. – Emil Vikström Jun 26 '17 at 08:42
-
We tried that as well but we have sometimes errors of gcsfuse dropping the connection and so I wouldn't suggest you to run that for production use – ale_tri Mar 19 '21 at 11:30
-
Set up a VM in the google cloud, using some *nix flavor. Set up ftp on it, and point it to a folder abc. Use google fuse to mount abc as a GCS bucket. Voila - back and forth between gcs / ftp without writing any software. (Small print: fuse rolls up and dies if you push too much data, so bounce it periodically, once a week or once a day; also you might need to set the mount or fuse to allow permissions for all users)

- 327
- 3
- 13