1

After an unsuccessful read of GitPython's documentation, I thought I'd raise my question on here.

I'm working in Python 3.10 and would like to clone a specific folder within a repository, specifically, the yml subfolder. I do not require the entire repo.

https://github.com/LOLBAS-Project/LOLBAS/tree/master/yml

Once initially cloned, I'd like to check whether the subfolder has had any updates and if so, I'd like to pull them to the yml folder.

As of now, I have a function that clones the entirety of the repo into a local directory.

import git
def repoCheck():
    try:
        git.Repo.clone_from('https://github.com/LOLBAS-Project/LOLBAS', 'LOLBAS')
        
    except git.GitCommandError as exception:
        print(exception)

This leaves me with (example):

C:\Users\ExampleUser\Documents\Lolbas

Lolbas/
├─ Logos/
├─ yml/
│  ├─ a.yml
│  ├─ b.yml
│  ├─ x.yml
├─ Archive-Old-Version/
│  ├─ x.yml
│  ├─ b.yml
├─ .gitignore
├─ package.json
├─ README.md

But I'd simply like a subfolder extract:

Lolbas/
├─ yml/
│  ├─ a.yml
│  ├─ b.yml
│  ├─ x.yml

Is initially cloning just this subfolder then making a pull request to check whether this specific subfolder is up-to-date possible?

Thank you for any help and guidance with this. I don't have much of a solution as I'm not overly familiar with Git and couldn't locate much information on GitPython docs.

geojoe
  • 329
  • 1
  • 3
  • 12
  • You cannot clone a directory; a repository must be cloned entirely. You can download a directory (see https://stackoverflow.com/q/7106012/7976758 found in https://stackoverflow.com/search?q=%5Bgit%5D+download+directory) but you loose the ability to monitor changes; every time you have to download the directory anew and compare with the previous one. – phd Feb 08 '22 at 14:24
  • Instead of downloading the directory clone the entire repository but do [sparse checkout](https://stackoverflow.com/a/13738951/7976758). See https://stackoverflow.com/search?q=%5Bgit%5D+sparse+checkout – phd Feb 08 '22 at 14:25
  • @phd thanks, I'm not sure GitPython has the ability to perform sparse-checkout unfortunately. – geojoe Feb 08 '22 at 14:27
  • 1
    In that case your best bet is the full clone and full checkout. – phd Feb 08 '22 at 14:30

1 Answers1

1

This is how I was able to pull a specific directory from a git repo:

from git import Repo
repo = Repo.init("path/to/local/repo")

# Create a new remote if there isn't one already created
origin = repo.remotes[0]
if not origin.exists():
    origin = repo.create_remote("origin", "https://github.com/LOLBAS-Project/LOLBAS")

origin.fetch()
git = repo.git()
git.checkout("origin/master", "--", "yml")

As for pulling any new updates, I suggest just removing the yml directory entirely before running the above code. There are probably better ways of doing this, but I find this to be the most straightforward.