0

I have created the zip file using linux zip command and uploaded it in my google drive. When I tried to download and unzip the zipped file using curl and unzip command (using a bash file), it gives me the following error.

Archive:  pretrained_models.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of pretrained_models.zip or
        pretrained_models.zip.zip, and cannot find pretrained_models.zip.ZIP, period.

Can anyone suggest any workaround to fix this issue?

In case if anyone wants to reproduce the error, I am sharing the .sh file.

#!/bin/bash
pretrained='https://drive.google.com/uc?export=download&id=0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA' 
# download pretrained models.
curl -o pretrained_models.zip $pretrained
unzip pretrained_models.zip
rm pretrained_models.zip

The file is publicly shared. For sanity check, you can download it from here.

N.B. I have seen related posts in other community of SO and some of them suggested to use different file extension but I want to stick to zip file.

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161

1 Answers1

1

When the shared files on Google Drive is downloaded, it is necessary to change the download method by the file size. It was found that the boundary of file size when the method is changed is about 40MB.

Modified scripts :

1. File size < 40MB

#!/bin/bash
filename="pretrained_models.zip"
fileid="0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA"
curl -L -o ${filename} "https://drive.google.com/uc?export=download&id=${fileid}"

2. File size > 40MB

When it tries to download the file with more than 40MB, Google says to download from following URL.

<a id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" href="/uc?export=download&amp;confirm=####&amp;id=### file ID ###">download</a>

Query included confirm=#### is important for downloading the files with large size. In order to retrieve the query from the HTML, it uses pup.

#!/bin/bash
filename="pretrained_models.zip"
fileid="0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA"
query=`curl -c ./cookie.txt -s -L "https://drive.google.com/uc?export=download&id=${fileid}" | pup 'a#uc-download-link attr{href}' | sed -e 's/amp;//g'`
curl -b ./cookie.txt -L -o ${filename} "https://drive.google.com${query}"
Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • @Wasi Ahmad I'm sorry for the inconvenience. Can I ask you about following question? 1. Is File ID of 0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA a zip file? 2. Is the file shared? 3. When you run ``curl -L 'https://drive.google.com/uc?export=download&id=0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA'``, are there some errors? Can you see the binary? In my Google Drive, I confirmed that the modified script works fine. – Tanaike Aug 04 '17 at 00:27
  • @Wasi Ahmad And please confirm the file ID again. – Tanaike Aug 04 '17 at 00:34
  • Yes, the file is public. I have updated my post with the link. Running curl command with -L gives the same error. – Wasi Ahmad Aug 04 '17 at 00:34
  • @Wasi Ahmad Can I try to download for your file ID? – Tanaike Aug 04 '17 at 00:35
  • Yeah sure. The id is 0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA – Wasi Ahmad Aug 04 '17 at 00:35
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/151007/discussion-between-wasi-ahmad-and-tanaike). – Wasi Ahmad Aug 04 '17 at 00:37
  • @Wasi Ahmad Updated my answer. It was found that the boundary is about 40MB. – Tanaike Aug 05 '17 at 02:26