5

I have a pkl file of 312 MB. I want to store it to an external server (S3) or a file storing service (for example, Google Drive, Dropbox or any other). When I run my model, the pkl file should be loaded from that external url. I have checked out this post but was unable to make it work.

Code:

import urllib
import pickle

Nu_SVC_classifier = pickle.load(urllib.request.urlopen("https://drive.google.com/open?id=1M7Dt7CpEOtjWdHv_wLNZdkHw5Fxn83vW","rb"))

Error:

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
nilansh bansal
  • 1,404
  • 1
  • 12
  • 23
  • 1
    Please don't rewrite your question as a whole new question after it's been (correctly) answered. Ask a new question if necessary, but don't invalidate good, useful answers by making your question about something else entirely. – ShadowRanger Nov 01 '18 at 18:52
  • I apologize @ShadowRanger for editing the answer, I am still new to the community, I edited the answer so that the new answers could see the updated version without getting into with the error already solved by Daniel. My question was to how to successfully load a pickle file from a url, it includes that .the pickle file should be loaded by the pickle.load module. – nilansh bansal Nov 01 '18 at 19:00

2 Answers2

4

The second argument of urllib.request.urlopen is the post data, not file mode, which is not needed.

import urllib.request
import pickle

Nu_SVC_classifier = pickle.load(urllib.request.urlopen("https://drive.google.com/open?id=1M7Dt7CpEOtjWdHv_wLNZdkHw5Fxn83vW"))
Daniel
  • 42,087
  • 4
  • 55
  • 81
  • 2
    It causes another error: **UnpicklingError: invalid load key, '<'.** – nilansh bansal Nov 01 '18 at 18:30
  • @nilanshbansal: One, we have no way of knowing what your bad data is. Two, completely rewriting your question invalidates existing answers; don't rewrite a validly asked and answered question. Ask a new question if need be (but make sure it includes a [MCVE]; we can't psychically divine the data you're trying to load). – ShadowRanger Nov 01 '18 at 18:51
  • 1
    @ShadowRanger I have myself upvoted the answer because it was a good one, but my question has not yet solved, I apologize for editing the question. The pickle file is absolutely correct, the same file when loaded locally doesn't generate error, so there's definitely some error when we are loading it from the drive. – nilansh bansal Nov 01 '18 at 18:57
  • @ShadowRanger the error is not because of the data, I get the same error as well. – raquelhortab Jun 28 '22 at 20:20
  • 1
    @raquelhortab: I just checked (apparently the file is open to all). The link used doesn't actually get the file's data, it gets an HTML page the contains interactive links to download the file. You need a direct link to the actual file's data to do anything useful with it (Google seems to make getting such a direct link rather a pain). – ShadowRanger Jun 28 '22 at 22:27
  • That's right! I figured it out yesterday but hadn't had time to add a comment here :) – raquelhortab Jun 29 '22 at 07:05
1

Try joblib instead of pickle, It works for me.

from urllib.request import urlopen
from sklearn.externals import joblib
Nu_SVC_classifier = joblib.load(urlopen("https://drive.google.com/open?id=1M7Dt7CpEOtjWdHv_wLNZdkHw5Fxn83vW"))
Ji Zhang
  • 23
  • 4