37

I want to retrieve all previous version of a specific file in a git repository.

I see it is possible to get one specific version with the checkout command, but I want them all. And the git clone command with the depth option doesn't seem to allow me to clone subfolder ("not valid repository name").

Do you know if it is possible and how?

Thank you

max152
  • 505
  • 1
  • 4
  • 6

6 Answers6

45

OP wanted to retrieve all versions, but the answers would not deliver. Especially if the file has hundreds of revisions (all suggestions are too manual). The only half-working solution was proposed by @Tobias in the comments, but suggested bash loop would build files in random order as well as it generates hundreds of empty files when used against our repos. One of the reasons was that "rev-list --all --objects" would list different objects (trees included - but useless for our purpose).

I started with Tobias's solution, added counters, clean up a bit and end up reinventing the wheel in form of the bash script listed below.

The script would:

  • extract all file versions to /tmp/all_versions_exported
  • take 1 argument - relative path to the file inside git repo
  • give result filenames numeric prefix (sortable)
  • mention inspected filename in result files (to tell apples apart from oranges:)
  • mention commit date in the result filename (see output example below)
  • not create empty result files

cat /usr/local/bin/git_export_all_file_versions

#!/bin/bash

# we'll write all git versions of the file to this folder:
EXPORT_TO=/tmp/all_versions_exported

# take relative path to the file to inspect
GIT_PATH_TO_FILE=$1

# ---------------- don't edit below this line --------------

USAGE="Please cd to the root of your git proj and specify path to file you with to inspect (example: $0 some/path/to/file)"

# check if got argument
if [ "${GIT_PATH_TO_FILE}" == "" ]; then
    echo "error: no arguments given. ${USAGE}" >&2
    exit 1
fi

# check if file exist
if [ ! -f ${GIT_PATH_TO_FILE} ]; then
    echo "error: File '${GIT_PATH_TO_FILE}' does not exist. ${USAGE}" >&2
    exit 1
fi

# extract just a filename from given relative path (will be used in result file names)
GIT_SHORT_FILENAME=$(basename $GIT_PATH_TO_FILE)

# create folder to store all revisions of the file
if [ ! -d ${EXPORT_TO} ]; then
    echo "creating folder: ${EXPORT_TO}"
    mkdir ${EXPORT_TO}
fi

## uncomment next line to clear export folder each time you run script
#rm ${EXPORT_TO}/*

# reset coutner
COUNT=0

# iterate all revisions
git rev-list --all --objects -- ${GIT_PATH_TO_FILE} | \
    cut -d ' ' -f1 | \
while read h; do \
     COUNT=$((COUNT + 1)); \
     COUNT_PRETTY=$(printf "%04d" $COUNT); \
     COMMIT_DATE=`git show $h | head -3 | grep 'Date:' | awk '{print $4"-"$3"-"$6}'`; \
     if [ "${COMMIT_DATE}" != "" ]; then \
         git cat-file -p ${h}:${GIT_PATH_TO_FILE} > ${EXPORT_TO}/${COUNT_PRETTY}.${COMMIT_DATE}.${h}.${GIT_SHORT_FILENAME};\
     fi;\
done    

# return success code
echo "result stored to ${EXPORT_TO}"
exit 0

Usage example:
cd /home/myname/my-git-repo

git_export_all_file_versions docs/howto/readme.txt
    result stored to /tmp/all_versions_exported

ls /tmp/all_versions_exported
    0001.17-Oct-2016.ee0a1880ab815fd8f67bc4299780fc0b34f27b30.readme.txt
    0002.3-Oct-2016.d305158b94bedabb758ff1bb5e1ad74ed7ccd2c3.readme.txt
    0003.29-Sep-2016.7414a3de62529bfdd3cb1dd20ebc1a977793102f.readme.txt
    0004.28-Sep-2016.604cc0a34ec689606f7d3b2b5bbced1eece7483d.readme.txt
    0005.28-Sep-2016.198043c219c81d776c6d8a20e4f36bd6d8a57825.readme.txt
    0006.9-Sep-2016.5aea5191d4b86aec416b031cb84c2b78603a8b0f.readme.txt
    <and so on and on . . .>

Note #1: if you see errors like this:

fatal: Not a valid object name
3e93eba38b31b8b81905ceaa95eb47bbaed46494:readme.txt

it means you've started the script not from the root folder of your git project.

Note #2: if you want to get all versions of the file that was deleted few commits ago you will have to switch to any of the old commits where that file was present (not yet deleted) by command:

git checkout OLD_HASH_WHERE_FILE_EXISTED
git_export_all_file_versions path/to/existing/file.ext

Otherwise it will error out "file does not exist". You don't have to switch to the very last commit where the deleted file was last seen, instead it can be any old commit where the file was there and then "git_export_all_file_versions" will extract all versions (even from "future" commits relative to the old commit you switched to).

Dmitry Shevkoplyas
  • 6,163
  • 3
  • 27
  • 28
  • The previously provided accepted answer (@sehe) did not indeed perform directly the retrieval of ALL versions at once. As mentionned in the comment I used the both command to build a java program (not a general solution that could be uploaded as is) that did it. Your solution is better as it gives the end result of my past java program. – max152 Jan 30 '17 at 09:51
  • This script works, but there are a few issues and possibly-unexpected behaviors. Please see [my answer](http://stackoverflow.com/a/43747334/1132502) for details and an updated script. – Nathan Arthur May 02 '17 at 21:14
  • @Nathan, awesome! glad you found it useful +1 – Dmitry Shevkoplyas May 03 '17 at 01:26
34

The script provided by Dmitry does actually solve the problem, but it had a few issues that led me to adapt it to be more suitable for my needs. Specifically:

  1. The use of git show broke because of my default date-format settings.
  2. I wanted the results sorted in date order, not reverse-date order.
  3. I wanted to be able to run it against a file that had been deleted from the repo.
  4. I didn't want all revisions on all branches; I just wanted the revisions reachable from HEAD.
  5. I wanted it to error if it wasn't in a git repo.
  6. I didn't want to have to edit the script to adjust certain options.
  7. The way it worked was inefficient.
  8. I didn't need the numbering in the output filenames. (A suitably-formatted date serves the same purpose.)
  9. I wanted safer "paths with spaces" handling

You can see the latest version of my modifications in my github repo or here's the version as of this writing:

#!/bin/sh
    
# based on script provided by Dmitry Shevkoplyas at http://stackoverflow.com/questions/12850030/git-getting-all-previous-version-of-a-specific-file-folder

set -e

if ! git rev-parse --show-toplevel >/dev/null 2>&1 ; then
    echo "Error: you must run this from within a git working directory" >&2
    exit 1
fi

if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
    echo "Usage: $0 <relative path to file> [<output directory>]" >&2
    exit 2
fi

FILE_PATH="$1"

EXPORT_TO=/tmp/all_versions_exported
if [ -n "$2" ]; then
    EXPORT_TO="$2"
fi

FILE_NAME="$(basename "$FILE_PATH")"

if [ ! -d "$EXPORT_TO" ]; then
    echo "Creating directory '$EXPORT_TO'"
    mkdir -p "$EXPORT_TO"
fi

echo "Writing files to '$EXPORT_TO'"
git log --diff-filter=d --date-order --reverse --format="%ad %H" --date=iso-strict "$FILE_PATH" | grep -v '^commit' | \
    while read LINE; do \
        COMMIT_DATE=`echo $LINE | cut -d ' ' -f 1`; \
        COMMIT_SHA=`echo $LINE | cut -d ' ' -f 2`; \
        printf '.' ; \
        git cat-file -p "$COMMIT_SHA:$FILE_PATH" > "$EXPORT_TO/$COMMIT_DATE.$COMMIT_SHA.$FILE_NAME" ; \
    done
echo

exit 0

An example of the output:

$ git_export_all_file_versions bin/git_export_all_file_versions /tmp/stackoverflow/demo
Creating directory '/tmp/stackoverflow/demo'
Writing files to '/tmp/stackoverflow/demo'
...

$ ls -1 /tmp/stackoverflow/demo/
2017-05-02T15:52:52-04:00.c72640ed968885c3cc86812a2e1aabfbc2bc3b2a.git_export_all_file_versions
2017-05-02T16:58:56-04:00.bbbcff388d6f75572089964e3dc8d65a3bdf7817.git_export_all_file_versions
2017-05-02T17:05:50-04:00.67cbdeab97cd62813cec58d8e16d7c386c7dae86.git_export_all_file_versions
Sid
  • 5,662
  • 2
  • 15
  • 18
Nathan Arthur
  • 1,132
  • 11
  • 17
  • Thanks for your update to the answer provided by Dmitry Shevkoplyas. For step 3, since the file is deleted, the user will have to create a blank version of the deleted file: deleted_file.extension to retrieve it. Otherwise, the following error will appear: `fatal: ambiguous argument 'deleted_file.extension': unknown revision or path not in the working tree.` – datalifenyc Apr 06 '20 at 13:57
  • @datalifenyc I'm not able to reproduce the problem you describe. In a test repo with a file named `test` that I've deleted (and made further commits), I'm able to run this script without error, both by supplying the path as `test` and as `./test`. I'm using git v2.24.1 on macOS. – Nathan Arthur Apr 06 '20 at 19:54
  • @nathan-author `macOS Catalina; bash 3.2.51; git 2.19.2` After the above script is run in the terminal $` – datalifenyc Apr 07 '20 at 16:25
  • 1
    I'm still not sure I understand the problem. The script isn't meant to handle a directory argument with no filename; it just does one file at a time. Please see [this example](https://www.rainskit.com/stash/git_export_example1.txt) of how I tested what I think you are describing. What are you doing differently? – Nathan Arthur Apr 08 '20 at 21:27
  • 1
    Thanks Nathan! That answers my question. I was thinking it could do either a file or a directory, since I saw the parameter ``. Individual files worked, so I couldn't figure out why directories weren't working. I appreciate your clarification and quick responses. – datalifenyc Apr 09 '20 at 12:52
  • 1
    Colons aren't safe in filenames on all systems. For safer date format, use `--date='format:%Y%m%d%H%M%S%z'` instead of `--date=iso-strict` (though note that sorting will not adjust for timezones in either case). – GPHemsley Jan 27 '21 at 20:49
  • 1
    It didn't work for me with errors on git cat-file `fatal: Not a valid object name hash:my_filename` . I used solution from here https://stackoverflow.com/questions/60480287/how-to-save-all-git-versions-of-a-file-to-disk.. – Valentas Oct 28 '21 at 14:54
  • The small change I did was to accept the git repository as the first parameter, that way you can have your script anywhere and run it against the repo you want – Arcanefoam Apr 29 '23 at 02:21
9
git rev-list --all --objects -- path/to/file.txt

lists you all the blobs associated with the repo path

To get a specific version of a file

git cat-file -p commitid:path/to/file.txt

(commitid can be anything

  • symbolic ref (branch, tag names; remote too)
  • a commit hash
  • a revision spec like HEAD~3, branch1@{4} etc.
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Ok great thanks! It took me time to understand (it's the first time I use git). I can now make a script that will reconstruct all the versions. – max152 Oct 12 '12 at 00:16
  • Nice expansion of the specifics of my generalized answer – gview Oct 12 '12 at 00:36
  • 3
    @user1739644 Are you by any chance trying to convert repositories? Have a look at `git fast-export --all errata.html` which has a welldocumented, simple file format, supported by many other VCS-es. – sehe Oct 12 '12 at 00:38
  • 1
    To write out all the versions of that file you can combine these commands like this: `git rev-list --all --objects -- some/path/file | cut -d ' ' -f1 | while read h; do (git cat-file -p $h:some/path/file > $h.file); done` – Tobias Jan 02 '17 at 09:56
0

Sometimes old versions of a file are only available through git reflog. I recently had a situation where I needed to dig through all the commits, even ones that were no longer part of the log because of an accidental overwriting during interactive rebasing.

I wrote this Ruby script to output all the previous versions of the file to find the orphaned commit. It was easy enough to grep the output of this to track down my missing file. Hope it helps someone.

#!/usr/bin/env ruby
path_to_file = ""
`git reflog`.split("\n").each do |log|
   puts commit = log.split(" ").first
   puts `git show #{commit}:#{path_to_file}`
   puts
 end

The same thing could be done with git log.

rb-
  • 2,315
  • 29
  • 41
-2

All the versions of a file are already in the git repo when you git clone it. You can create branches associated with the checkout of a particular commit:

git checkout -b branchname {commit#}

This might suffice for a quick and dirty manual comparison of changes:

  • checkout to branches
  • Copy to an editor buffer

This might be ok, if you only have a few versions to be concerned with and don't mind a bit of manual, albeit git built-in commands.

For scripted solutions, there are already a couple of other solutions that were provided in other answers.

gview
  • 14,876
  • 3
  • 46
  • 51
  • Thank you! I didnt know that (first time I use git). I am now able to retrieve all versions. – max152 Oct 12 '12 at 00:18