5

Having this code

from dulwich.objects import Blob, Tree, Commit, parse_timezone
from dulwich.repo import Repo
from time import time

repo = Repo.init("myrepo", mkdir=True)
blob = Blob.from_string("my file content\n")
tree = Tree()
tree.add("spam", 0100644, blob.id)
commit = Commit()
commit.tree = tree.id


author = "Flav <foo@bar.com>"
commit.author = commit.committer = author
commit.commit_time = commit.author_time = int(time())
tz = parse_timezone('+0200')[0]
commit.commit_timezone = commit.author_timezone = tz
commit.encoding = "UTF-8"
commit.message = "initial commit"

o_sto = repo.object_store
o_sto.add_object(blob)
o_sto.add_object(tree)
o_sto.add_object(commit)

repo.refs["HEAD"] = commit.id

I end up with the commit in the history, BUT the created file is pending for deletion (git status says so).

A git checkout . fixes it.

My question is: how to do git checkout . programmatically with dulwich?

CharlesB
  • 86,532
  • 28
  • 194
  • 218
Flavius
  • 13,566
  • 13
  • 80
  • 126

4 Answers4

9

Git status says it's deleted because the file doesn't exist in the working copy, that's why checking it out fixes the status.

It looks like there's no support for high-level working copy classes and functions in dulwich yet. You'd have to deal with trees and blobs and unpacking objects.

OK, took the challenge: I could make a basic checkout with Dulwich :

#get repository object of current directory
repo = Repo('.')
#get tree corresponding to the head commit
tree_id = repo["HEAD"].tree
#iterate over tree content, giving path and blob sha.
for entry in repo.object_store.iter_tree_contents(tree_id):
  path = entry.in_path(repo.path).path
  dulwich.file.ensure_dir_exists(os.path.split(path)[0])
  with open(path, 'wb') as file:
    #write blob's content to file
    file.write(repo[entry.sha].as_raw_string()) 

It won't delete files that must be deleted, won't care about your index, etc.
See also Mark Mikofski's github project for more complete code based on this.

CharlesB
  • 86,532
  • 28
  • 194
  • 218
  • +1 for using `with open ...` instead of `f.close()`! Also you can add `in_path()` to `entry.path` which will append `` to the `TreeEntry` named tuple. see [dulwich API doc](http://www.samba.org/~jelmer/dulwich/apidocs/dulwich.objects.TreeEntry.html) – Mark Mikofski Sep 10 '12 at 06:10
  • I will if I come up with anything better. FYI `d.repo.BaseRepo.get_blob(sha)` raises `NotBlob` error, instead of `get_object`, otherwise it's exactly the same. Also `d.file.ensure_dir_exists(os.path.split(entry.in_path(repo.path).path)[0])` does a nice job of making your directories, if they don't already exist. Finally `d.GitFile(path, mode)` does the same thing as `file`. Do you know what the difference between `as_raw_string` and `as_pretty_string` is? They seem the same. I started a dulwich porcelain repo for more of these snippets on github. – Mark Mikofski Sep 11 '12 at 06:50
  • this doesn't set the mode, so git status still says deleted or untracked, so use `chmod entry.mode entry.in_path(repo.path).path`. Just one thing, not sure about "The file mode is like the octal argument you could give to the chmod command. Except it is in extended form to tell regular files from directories and other types." [dulwich introduction: the tree](http://www.samba.org/~jelmer/dulwich/docs/tutorial/introduction.html) – Mark Mikofski Sep 11 '12 at 07:50
  • [ignore file mode](http://stackoverflow.com/questions/1580596/how-do-i-make-git-ignore-mode-changes-chmod) – Mark Mikofski Sep 11 '12 at 08:18
  • last comment I swear, `Blob.data` is same as `as_raw_string()` [d.o.Blob](http://www.samba.org/~jelmer/dulwich/apidocs/dulwich.objects.Blob.html) – Mark Mikofski Sep 11 '12 at 08:27
  • your edit suggestion was rejected, but shouldn't have, can you submit again? I'll approve – CharlesB Sep 11 '12 at 08:34
  • I added dulwich_checkout.py to the [dulwich porcelain repo these snippets on github](https://github.com/mikofski/dulwichPorcelain). – Mark Mikofski Sep 12 '12 at 18:16
  • My first edit that uses `ensure_dir_exists(...)` to create folders _was_ accepted, however, I tried to edit it again to use `os.chmod(entry.in_path(repo.path).path,entry.mode)` but that edit was rejectd. The os.chmod() is important if `filemode=true` in git config. – Mark Mikofski Sep 15 '12 at 23:03
  • One more thing, dulwich complains that `get_blob(sha)` or `get_object(sha)` are deprecated (dulwich-0.8.5) and to now use `repo[sha]` instead which works fine. Also `Blob.data` attribute works just as well as `Blob.as_string()`. – Mark Mikofski Sep 16 '12 at 06:01
  • @MarkMikofski edited; can you [attribute the origin](http://blog.stackoverflow.com/2009/06/attribution-required/) somewhere on your project? – CharlesB Sep 16 '12 at 06:45
  • @MarkMikoski noticed that the [feature request](https://bugs.launchpad.net/dulwich/+bug/719026) for repo checkout was fixed! now it's in the library, [see the corresponding merge](https://github.com/milki/dulwich/commit/e6bc1d3bf08abbce3bd1e07bb1860e116330f2fd) – CharlesB Sep 17 '12 at 18:35
3

It is now possible since release 0.8.4, with the method dulwich.index.build_index_from_tree().

It writes a tree to both the index file and the filesystem (working copy), which is a very basic form of checkout.

See the note

existing index is wiped and contents are not merged in a working dir. Suiteable only for fresh clones

I could get it work with the following code

from dulwich import index, repo
#get repository object of current directory
repo = repo.Repo('.')
indexfile = repo.index_path()
#we want to checkout HEAD
tree = repo["HEAD"].tree

index.build_index_from_tree(repo.path, indexfile, repo.object_store, tree)
CharlesB
  • 86,532
  • 28
  • 194
  • 218
1

In case you want to check out an existing branch from a remote repository, this is how i finally managed to do it:

from dulwich import porcelain
gitlab_server_address = 'gitlab.example.com/foo/my_remote_repo.git'
username = 'foo@bar.com'
password = 'mocraboof'

repo = porcelain.clone(gitlab_server_address, target='myrepo', username=username, password=password)

# or if repo already exists: 
# repo = porcelain.open_repo('gholam')

branch_name = 'thebranch'
porcelain.branch_create(repo, branch_name)
porcelain.update_head(repo, target=branch_name, detached=False, new_branch=None)

porcelain.pull(repo, gitlab_server_address, refspecs=f'refs/heads/{branch_name}', username=username, password=password)

the problem was that when you clone a repository with dulwich, it will only fetch the main/master branch, and i couldn't find another way to fetch them. so i create the branch as new branch from main/master and then pull from remote.

(this might not work if your main branch is ahead of the initial commit that started your remote branch.)

Saee Saadat
  • 629
  • 5
  • 9
0
from dulwich.repo import Repo

repo = Repo.init('myrepo', mkdir=True)
f = open('myrepo/spam', 'w+')
f.write('my file content\n')
f.close()
repo.stage(['spam'])
repo.do_commit('initial commit', 'Flav <foo@bar.com>')

Found by looking at dulwich/tests/test_repository.py:371. dulwich is powerful but the docs are a bit lacking, unfortunately.

May also want to consider using GitFile instead.

raylu
  • 2,630
  • 3
  • 17
  • 23
  • -1 seeing the source, it does not checkout the commit; it is a wrapper of the OP's code – CharlesB Jul 10 '11 at 12:00
  • actually it works because you write the file to the working copy, so there's no need to check out, but it doesn't answer the OP question. – CharlesB Jul 10 '11 at 16:04
  • It's a wrapper of the OP's code that produces the end result he wants in less lines of code. It's not merely that it writes the file to the working directory; it uses it to perform the commit. This is the "correct" way to use dulwich to do what the OP is doing. – raylu Jul 10 '11 at 18:34
  • Sure, it's a nicer way to do a commit, but it doesn't say how to checkout, which I found to be an interesting problem. – CharlesB Jul 10 '11 at 18:45
  • This solution is ok for me too, as I'll have the data first, then I'll add it. It doesn't answer the question though, and I'm still curious how to *checkout a branch*. – Flavius Jul 10 '11 at 19:05
  • @Flavius: to checkout branch replace `repo["HEAD"].tree` with `repo.refs['refs/heads/yourbranch'].tree` in my answer – CharlesB Jul 10 '11 at 20:27
  • Is it possible that repo.do_commit() only allows committing to refs/heads/master? It looks like one needs to be able to set Commit.parents before committing, which can't be done with this method. So apparently the simplification made in this answer is quite limiting; it may be better to extend the original code. – Daniel F Aug 01 '11 at 22:18