0

Old Problem:

I’m trying to use Binder to share my Git repo, but the git repo has git-lfs files.

If I try to use the repo when pointer files to the git-lfs files are present instead of the actual files, I get the following error from Binder: Smudge error: Error downloading: [404] Object does not exist on the server.

If I try to use the repo when the actual git-lfs files are present, I get the following error from Binder: Error: ErrImagePull.

New Problem:

I got a Pickling Error after trying to open a pickle file in a jupyter notebook in Binder.

  • This looks like a problem in Git-LFS (it's not Git itself, it's the LFS part). I changed one tag. – torek Oct 20 '21 at 22:04
  • @torek Do you know what kind of problem this might be, or if there is a way around using Git-LFS? – Andrew Hinh Oct 20 '21 at 23:20
  • I don't know what to *do* about the problem, but the problem itself is pretty clear: the file stored in the repository says "go to server X and retrieve file number Y" and while server X exists and responds to requests, it says "I have no file number Y". – torek Oct 21 '21 at 02:19
  • @torek I edited my question to include a screenshot of the error I get even when I have the actual files stored in git-lfs and present in github. Do you think this is still a git-lfs issue? – Andrew Hinh Oct 21 '21 at 17:21
  • Yes: Git-LFS is written in Go and uses GRPC and the error message is describing some sort of internal error that is occurring somewhere around the place that Git-LFS is using GRPC to retrieve the large file. – torek Oct 21 '21 at 17:26
  • See the `rpc error` part, with `code = unknown` (that seems ... wrong, but it depends on what they put into the grpc setup) and `desc = ...` where the description talks about some daemon (some sort of LFS daemon I assume) getting an error trying to read a manifest (list of files). – torek Oct 21 '21 at 17:28
  • In this case, the failure inside the LFS code may well be related to this Binder thingy. Does Binder use FUSE to provide a file system? If so, perhaps it isn't providing the right FS operations for the LFS back end. – torek Oct 21 '21 at 17:29
  • @torek From the official website, the following are the projects that Binder uses: JupyterHub, which manages cloud infrastructure for user instances, repo2docker, which builds Docker images from GitHub repositories, and BinderHub, which orchestrates the above two projects and provides the Binder interface. – Andrew Hinh Oct 21 '21 at 17:35
  • Yeah, but that doesn't tell me how the Go runtime will "see" the file system. Git-LFS runs on the LFS server and just uses ordinary file access. – torek Oct 21 '21 at 17:38
  • @torek Ok, since this seems to be Git-LFS being buggy, I'll try reinstalling git-lfs. Does git lfs uninstall suffice, or is there more to uninstalling it? – Andrew Hinh Oct 21 '21 at 17:55
  • @torek I managed to remove the files from Git-LFS, reinstall Git-LFS, and upload the files again, but still got the same error. – Andrew Hinh Oct 22 '21 at 00:42
  • I don't know that LFS is *buggy* here: it may just not *work with the binder thingy*. You'd need someone who has actually used one or both systems at this point. All I can say for sure is that Git itself is quite out of the loop here. – torek Oct 22 '21 at 03:59
  • @torek Thanks for all your help. As noted in https://forums.fast.ai/t/how-do-i-use-binder-with-a-github-repo-that-has-git-lfs-files/91952/3, I've moved my model files to Google Drive and used wget to pull it from the notebook to avoid having to store the files in Git-LFS. The docker image builds! – Andrew Hinh Oct 22 '21 at 04:55
  • @torek There's a new problem now: I got the new error mentioned in the question when I tried to load a .pkl file into the .ipynb file in Binder. Could this be an issue with uploading the model files to Google Drive, a requirements.txt issue, a runtime issue, or something else? – Andrew Hinh Oct 22 '21 at 09:51
  • I feel you have branched this question so many times that it really isn't suited for StackOverflow now. This would have made a better [Jupyter Discourse Forum](https://discourse.jupyter.org/), or each of your issues a separate Stackoverflow issue. You could have posted at [Jupyter Discourse Forum](https://discourse.jupyter.org/) about needing help with your repo under the 'Binder help-wanted' category and had an exchange as you got things working. [Here is an example](https://discourse.jupyter.org/t/cor-and-hist-function-of-r-not-working-in-binder/10099?u=fomightez) of such an exchange. – Wayne Oct 22 '21 at 15:54
  • @Wayne Thanks for the advice. I actually posted this question in the Fastai Forums and talked with someone about moving my files to Google Drive and using !wget to get my files through Jupyter Notebook. – Andrew Hinh Oct 22 '21 at 16:28
  • 1
    I agree with @Wayne here in at least one way: once you have a *new and different problem*, it needs a *new and different question*. The only part I don't know is whether that question is best asked on another forum entirely. :-) A pickling error like the one you observed tends to mean either that the byte stream was damaged, or that something has changed versions. – torek Oct 22 '21 at 17:47
  • @torek Thanks for this: I'll make sure to do this next time. As for the pickling error, the byte stream may have been damaged when I uploaded or downloaded the files to or from Google Drive. If not that, then as you pointed out, there's probably something off about my requirements.txt. – Andrew Hinh Oct 22 '21 at 17:52

2 Answers2

1

Old Problem's Answer: I moved my model files to Google Drive and used !wget to pull it from the notebook to avoid having to store the files in Git-LFS.

New Problem's Answer: I had to pin all of my packages' versions to the ones I used in my local virtual env.

1

In response to the title of your post entitled presently 'How do I use Binder with a GitHub repo that has Git-lfs files?'. There seems to be an example that might help you.

If you go to the Jupyter Discourse Forum and search 'git-lfs', you'll end up somewhere like here. Presently there are two related posts:

The second one looks to link to Emtion Faces: Emotion classifier trained with Fastai, displayed with a Jupyter notebook and Voila, deployed with Binder. that uses git-lfs and fastai and pickling and so it may help you.
In particular with regard to git-lfs, I had noted that the pickle file is stored with git-lfs, see here.

Wayne
  • 6,607
  • 8
  • 36
  • 93
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/238526/discussion-on-answer-by-wayne-how-do-i-use-binder-with-a-github-repo-that-has-gi). – Machavity Oct 26 '21 at 02:54