0

I know this question has been asked multiple times, but none of the solutions actually worked so far.

I would like to pull some files to a web tool based on an URL.

This seems to be an FTP share but using

import ftplib
url = 'ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS1167'
ftp = ftplib.FTP(url)

6 ftp = ftplib.FTP(url) gaierror: [Errno -2] Name or service not known

It is easy to download single files with wget:

wget.download(url+'/'+filename, out=ms_dir)

However, the python implementation of wget does not have all features of the Linux tool implemented. So, something like wget.download(url+'/*.*', out=ms_dir) does not work.

Therefore, I need to pull the list of files that I want to download first and download the files one by one. I tried beautifulsoup, requests, urllib. But all the solutions seem over-complicated for a problem that was probably solved a million times ten years ago, or don't work at all.

However, e.g.

import requests
response = requests.get(url, params=params)

InvalidSchema: No connection adapters were found for...

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url)

URLSchemeUnknown: Not supported URL scheme ftp

And so on. I am not sure what I am doing wrong here.

martineau
  • 119,623
  • 25
  • 170
  • 301
Soerendip
  • 7,684
  • 15
  • 61
  • 128
  • Looking at the `ftplib` docs, it says: Example: >>> from ftplib import FTP >>> ftp = FTP('ftp.python.org') # connect to host, default port >>> ftp.login() # default, i.e.: user anonymous, passwd anonymous@ – astrochun Feb 04 '21 at 01:29
  • 1
    How is that different from what I wrote above? I am getting an error message already at FTP(URL) you mean I have to use FTP('ftp.ebi.ac.uk') ? That I can try. That is probably what I did wrong. – Soerendip Feb 04 '21 at 03:02

1 Answers1

0
import ftplib
from urllib.parse import urlparse

def get_files_from_ftp_directory(url):
    url_parts = urlparse(url)
    domain = url_parts.netloc
    path = url_parts.path
    ftp = ftplib.FTP(domain)
    ftp.login()
    ftp.cwd(path)
    filenames = ftp.nlst()
    ftp.quit()
    return filenames

get_files_from_ftp_directory(URL)

Thanks, I was using the whole URL instead of just the domain to login. I use this function to get the filenames and then download them with the more comfortable wget api.

Soerendip
  • 7,684
  • 15
  • 61
  • 128