2

I am a beginner in Python programming but I believe the problem I am trying to solve might not be a big one.

So I am working on a program to present the last row on the latest csv file to the end user. At the moment I am copying and pasting the latest file from the FTP directory onto for example:

pd.read_csv("ftp://123.4.567.890/folder1/folder2/123.csv")

where 123.csv is the latest file. Any solutions on how I might be able to get that 123.csv file automatically on to the pandas read() function? In addition, I am using Jupyter Notebook but I am somehow unable to change the working directory from my os to FTP. If I am able to do that it might be very helpful. The arrangement of the files on the FTP directory looks like below with no column names-

02/03/2021 12:00AM         37,471 312.csv
02/03/2021 12:00AM         24,138 312.raw
01/26/2021 12:00AM         31,246 612.csv
01/26/2021 12:00AM         19,098 2612.raw
02/01/2021 12:00AM         15,337 0100.csv
02/01/2021 12:00AM          9,858 0100.raw
02/02/2021 12:00AM        134,098 0112.csv

So guys how to fetch the latest CSV file from above?

I would really appreciate your help.

Thanks

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
Nishant
  • 21
  • 1

3 Answers3

0

Pandas can read CSV files directly using the FTP protocol (it's not limited to just the HTTP/HTTPS protocol).

You need to make sure you ftp URL is correct (the IP address in your question is not a valid IP address - IPv4 IPs are 4 x 8 bit numbers so max 255.255.255.255) - and refers to the latest file. You may need to do some processing if the latest file doesn't have a standardised name. If you have control over the server, you could add a link e.g. ftp://servername/latest.csv

Alternatively, you could do this dynamically on the client, using Python:

import ftplib

FTP_HOST = 'ftp.ifremer.fr'
FTP_DIR = '/ifremer/argo/etc/ObjectiveAnalysisWarning/incois/'

# connect to the remote server using anonymous FTP
ftp = ftplib.FTP(FTP_HOST, 'anonymous', '')
# change the remote working directory
ftp.cwd(FTP_DIR)
# load the modification dates for each file
results = [(name, ftp.voidcmd("MDTM " + name), ) for name in ftp.nlst()]
# sort by modification date
results.sort(key=lambda x: x[1])
# get the filename for the most recently modified file
most_recent_filename = results[-1][0]

Here is an example of using pandas to download a CSV from a publically available FTP source:

import pandas as pd
df = pd.read_csv('ftp://ftp.ifremer.fr/ifremer/argo/etc/ObjectiveAnalysisWarning/incois/ar_scoop2_IN_20130722123443.csv')

To use the already identified most recent file from the earlier code and download using pandas lib:

df = pd.read_csv(
    'ftp://' + FTP_HOST + os.path.join(FTP_DIR, most_recent_filename)
)

Adjust the URLs to your own, valid URL and you should have the DataFrame that you need.

To get the last row of a DataFrame:

df.iloc[-1, :]
moo
  • 1,597
  • 1
  • 14
  • 29
0

There's no magic solution that will have Pandas load the latest file from an FTP server.

You need to split your task to two steps:

  1. Finding the latest file in the FTP server:
    Python FTP get the most recent file by date

    Your server seems to be IIS. IIS does not support MLSD. And your IIS server is configured to use DOS style listing. Most code you will find for parsing LIST response is for *nix servers. Unless you can configure your IIS to use *nix style listing, most code won't work with your server. Either you will have to adjust the code. Or use the less efficient MDTM solution (should be ok, if there are few files only).

  2. Loading that file to Pandas (you have that already).

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
-1

I didn't do exactly the code for a csv file but here is something similar to your question.

I had the same problem with the opening of a notepad file and had to copy the directory to the new file each time. Here is the code i wrote to overcome the problem.

filename = str(input("Please input the file name: "))

newfile = str(filename + ".txt")

import subprocess

subprocess.Popen(["notepad",newfile])

So the code allows you to enter the notepad file name(Trial 1) and then concatenates it with .txt. This concatenated string is then used as the directory to the file which is the opened using the subprocess.Popen.

This worked excellently for me, and i know it is maybe not relevant to your question but i hope it helps.

Regards

Reema Q Khan
  • 878
  • 1
  • 7
  • 20