0

I have downloaded many gz files from an ftp address :

http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/

How can I check that whether the files have been truncated during the download (i.e. wget did not download the entire file because of network connection) ? Thanks.

Bogdan
  • 345
  • 1
  • 16

2 Answers2

1

As you can see in each directory you have file md5sum.txt. You can use command like:

md5sum -c md5sum.txt  

This will calculate the hashes and compare them with the values in the file.

Romeo Ninov
  • 6,538
  • 1
  • 22
  • 31
  • Thank you. I have downloaded the file from other locations, when md5sum files were not available. The main directory is very complicated and I do not know how to download the md5sum files from each folder, on one hand. On the other hand, the md5sum files in each folder have the same name "md5sum.txt" and when I download these files, the files may over-write each other. – Bogdan May 31 '22 at 05:12
  • @Bogdan, download recursively the files in directories. Do not download them in one directory – Romeo Ninov May 31 '22 at 09:02
0

How can I check that whether the files have been truncated during the download (i.e. wget did not download the entire file because of network connection) ?

You might use spider mode to get just headers of response, for example

wget --spider http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/Alasoo_2018/exon/Alasoo_2018_exon_macrophage_naive.permuted.tsv.gz

gives output

Spider mode enabled. Check if remote file exists.
--2022-05-30 09:38:55--  http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/Alasoo_2018/exon/Alasoo_2018_exon_macrophage_naive.permuted.tsv.gz
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.193.138
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 645718 (631K) [application/octet-stream]
Remote file exists.

Length is size of file (in bytes) so after comparing it with your local file you will be able to tell if it is complete or not.

If you want to download missing parts if any, rather than merely check for completeness, then take look at -c option, from wget man page

-c

--continue

Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program.(...)

Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Thank you for your suggestions. Shall I use wget --spider on many directories/folders, how shall I records the wget --spider messages below in one single file (i.e. Length: 645718 (631K) [application/octet-stream] ; Remote file exists) – Bogdan May 31 '22 at 05:17
  • @Bogdan just append to file what wget produces to stderr, e.g. `wget --spider http://www.example.com 2>>wgetoutput.txt` – Daweo May 31 '22 at 07:04