0

There are a few answers on this topic already, but pretty much all of them say that it's bad to parse the output of ls -l, and therefore suggest other methods.

However, I'm using ncftpls -l, and so I can't use things like shell globs or find – I think I have a genuine need to actually parse the ls -l output. Don't worry if you're not familiar with ncftpls, the output returns in exactly the same format as if you were just using ls -l.

There is a list of files at a public remote ftp directory, and I don't want to burden the remote server by re-downloading each of the desired files every time my cronjob fires. I want to check, for each one of a subset of files within the ftp directory, whether the file exists locally; if not, download it.

That's easy enough, I just use

tdy=`date -u '+%Y%m%d'`_

# Today's files
for i in $(ncftpls 'ftp://theftpserver/path/to/files' | grep ${tdy}); do
    if [ ! -f $i ]; then
        ncftpget "ftp://theftpserver/path/to/files/${i}"
    fi
done

But I came upon the issue that sometimes the cron job will download a file that hasn't finished uploading, and so when it fires next, it skips the partially downloaded file.

So I wanted to add a check to make sure that for each file that I already have, the local file size matches the size of the same file on the remote server.

I was thinking along the lines of parsing the output of ncftpls -l and using awk, something like

for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
    ...
    x=filesize   # somehow get the file size and the filename
    y=filename   # from $i on each iteration and store in variables
    ...
done

but I can't seem to get both the filename and the filesize from the server into local variables on the same iteration of the loop; $i alternates between $9 and $5 in the awk string with each iteration.

If I could manage to get the filename and filesize into separate variables with each iteration, I could simply use stat -c "%s" $i to get the local size and compare it with the remote size. Then its a simple ncftpget on each remote file that I don't already have. I tinkered with syncing programs like lftp too, but didn't have much luck and would rather do it this way.

Any help is appreciated!

John Kealy
  • 1,503
  • 1
  • 13
  • 32

1 Answers1

1

for loop splits when it sees any whitespace like space, tab, or newline. So, IFS is needed before loop, (there are a lot of questions about ...)

IFS=$'\n' && for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do

echo $i | awk '{print $NF}' # filesize 
echo $i | awk '{NF--; print}' # filename
# you may have spaces in filenames, so is better to use last column for awk

done

The better way I think is to use while not for, so

ls -l | while read i
do
echo $i | awk '{print $9, $5}'

#split them if you want 
x=echo $i | awk '{print $5}'
y=echo $i | awk '{print $9}'

done
Bulikeri
  • 136
  • 7