89

Just say I have a file: "HelloWorld.pm" in multiple subdirectories within a Git repository.

I would like to issue a command to find the full paths of all the files matching "HelloWorld.pm":

For example:

/path/to/repository/HelloWorld.pm
/path/to/repository/but/much/deeper/down/HelloWorld.pm
/path/to/repository/please/dont/make/me/search/through/the/lot/HelloWorld.pm

How can I use Git to efficiently find all the full paths that match a given filename?

I realise I can do this with the Linux/Unix find command but I was hoping to avoid scanning all subdirectories looking for instances of the filename.

Newbie Git
  • 1,171
  • 1
  • 8
  • 7

7 Answers7

125

git ls-files will give you a listing of all files in current state of the repository (the cache or index). You can pass a pattern in to get files matching that pattern.

git ls-files HelloWorld.pm '**/HelloWorld.pm'

If you would like to find a set of files and grep through their contents, you can do that with git grep:

git grep some-string -- HelloWorld.pm '**/HelloWorld.pm'
Brian Campbell
  • 322,767
  • 57
  • 360
  • 340
  • ls-files can also take a pattern. – Josh Lee Apr 15 '11 at 20:20
  • @jleedev Ah, right. Updated my answer to simplify it and fix a problem with the pattern in `git grep`. – Brian Campbell Apr 15 '11 at 20:25
  • (Annoyingly, it’s called a [pathspec](http://www.kernel.org/pub/software/scm/git/docs/gitglossary.html#def_pathspec) in gitglossary(7), but that term is not consistently used elsewhere.) – Josh Lee Apr 15 '11 at 20:34
  • 1
    Remember to use '**/HelloWorld.pm' instead of '*/HelloWorld.pm' to search any depth of the repository for matches. The OP's example has files at various levels. – John Rix Aug 13 '14 at 10:06
  • 11
    'git ls-files' does not list files in the repository. It lists file names in the index (staging area) or working tree. It's entirely normal for a file name to be somewhere in the repository but not in the index or working tree -- the file name might be on a different branch than the one you've currently checked out, for instance. The answer by @GregHewgill should be considered more correct here. – stevegt Dec 08 '14 at 15:50
  • 1
    (Missed the 5-minute comment edit window...) The answers by Uwe Geuder and Dean Hall essentially expand on Greg's, by iterating through all branches and tags, handling the case of files named on other branches (or that have been deleted). – stevegt Dec 08 '14 at 16:08
  • I had to run `git ls-files */HelloWorld.pm` (without the quotes). I'm on Windows for what it's worth. – Millie Smith May 08 '17 at 20:33
  • 1
    note that this won't find HelloWorld.pm at the root of your project. In that case you need to use `git ls-files 'HelloWorld.pm' '*/HelloWorld.pm'` – Chris Maes Aug 23 '18 at 09:49
  • I can't get the pattern search at the end of `git ls-files` to work at all. Piping it to `grep` works **much better**: `git ls-files | grep "my regex search for a filename"`. See: https://stackoverflow.com/a/24289481/4561887. – Gabriel Staples May 13 '20 at 01:04
  • I've update my answer to take into account some of the feedback in the comments; added the top-level filename, and `**` to match arbitrarily deep. For those who are on Windows, how it works may depend on what shell you are using, whether you're using cmd.exe, powershell, or one of the various Unix shells for Windows like Git Bash, Cygwin, msys, or Windows Subsystem for Linux. I can't provide a good comprehensive for how it should work on Windows. @GabrielStaples, does it work better with the updates from the feedback? – Brian Campbell May 13 '20 at 06:22
  • @BrianCampbell, I'm on Ubuntu 18.04. Here's my results.Assume my filename is called "FileName.cpp". `git ls-files | grep FileN` returns `path/to/FileName.cpp` with the FileN part highlighted. `git ls-files FileN` returns nothing. `git ls-files '**/FileN'` returns nothing. `git ls-files 'FileN**'` returns nothing. `git ls-files '*/FileN*'` returns `path/to/FileName.cpp` with no highlighting. That's a pain. I don't know how this pattern matching works. I do know how regex pattern matching in grep works.I'll just stick to the `git ls-files | grep FileN` style.It's much more consistent & reliable. – Gabriel Staples May 13 '20 at 21:11
  • "I don't know how this pattern matching works." Clarification: I don't know the *details* of how this pattern matching works. Clearly it's some sort of basic pattern matching where `*` is a wildcard, but I don't like that I can't just search a part of the filename and be done, like piping to `grep` allows, and that if I forget the wildcards I'll get no results and think the file doesn't exist! This makes it prone to user error. And again, I don't know the details of how that pattern matching works, and I'm sure it's nowhere near as powerful as grep, so, piping to grep is the best answer. – Gabriel Staples May 13 '20 at 21:26
47

Hmm, the original question was about the repository. A repository contains more than 1 commit (in the general case at least), but the answers given before search only through one commit.

Because I could not find an answer that really searches the whole commit history I wrote a quick brute force script git-find-by-name that takes (nearly) all commits into consideration.

#! /bin/sh
tmpdir=$(mktemp -td git-find.XXXX)
trap "rm -r $tmpdir" EXIT INT TERM

allrevs=$(git rev-list --all)
# well, nearly all revs, we could still check the log if we have
# dangling commits and we could include the index to be perfect...

for rev in $allrevs
do
  git ls-tree --full-tree -r $rev >$tmpdir/$rev 
done

cd $tmpdir
grep $1 * 

Maybe there is a more elegant way.

Please note the trivial way the parameter is passed into grep, so it will match parts of filename. If that is not desired anchor your search expression and/or add suitable grep options.

For deep histories the output might be too noisy, I thought about a script that converts a list of revisions into a range, like the opposite of what git rev-list can do. But so far it has remained a thought.

ulidtko
  • 14,740
  • 10
  • 56
  • 88
Uwe Geuder
  • 2,236
  • 1
  • 15
  • 21
  • Great script. However I was unable to use it because my git repo is so large that the script flooded my hard drive :( – Arne Böckmann Dec 12 '13 at 10:09
  • @ArneBöckmann Just move the grep command into the last loop and remove everything after each grep. – Uwe Geuder Dec 12 '13 at 14:25
  • 11
    Your code can be made into a one-liner: `git rev-list --all | xargs -I '{}' git ls-tree --full-tree -r '{}' | grep '.*HelloWorld\.pm$'`. This also solves the hard-drive flooding issue. – subhacom Feb 17 '16 at 06:11
  • 1
    @subhacom your oneliner should be the accepted answer – hobs Nov 12 '18 at 21:05
  • @subhacom I suggest you to put your reply into a separate answer, because your oneliner seems to be the best solution presently. – Exterminator13 Jun 13 '23 at 15:33
28

Try:

git ls-tree -r HEAD | grep HelloWorld.pm
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 1
    Or on Windows: `git ls-tree -r HEAD | findstr HelloWorld.pm` – John Rix Aug 13 '14 at 10:10
  • `man git ls-tree` shows that `-r` means "Recurse into sub-trees." I don't know what that means. Can you please explain what this means? – Gabriel Staples May 13 '20 at 01:01
  • @JohnRix, last I checked, if you're using the terminal provided by [Git for Windows](https://git-scm.com/download/win), which I *highly recommend* on Windows, it supports common Linux commands such as piping to `grep`, running bash scripts, etc., so this answer should work fine as-is. Try it out and let me know. I entirely ditched Windows for Ubuntu a couple years ago. – Gabriel Staples May 13 '20 at 01:06
  • @GabrielStaples, rightly or wrongly, I'm a bit of a curmudgeon when it comes to alternate terminals in Windows (perhaps partly on account of being browned off by CygWin many years ago), and tend to stick with the lowest common denominator that will always be available to me. (On the other hand, the release of WSL 2 on Windows 10 is imminent, and reports are it will work very efficiently, so perhaps I'll finally say goodbye to the old Windows command prompt!) – John Rix May 14 '20 at 22:51
  • By the way, `-r` should cause the ls-tree command to search through sub-directories in the repository. – John Rix May 14 '20 at 22:53
  • I had to Google [curmudgeon](https://www.wordreference.com/definition/curmudgeon). And regarding the `-r`, so you're saying that a "sub-tree" here simply means a "sub-directory" then? I wonder why they said "sub-tree" instead of "sub-directory", as I think "sub-directory" is much more clear. I was thinking maybe a "sub-tree" was a git commit hash (or git branch), & that recursing down the tree therefore meant it would do the file search from the head commit hash down, searching for the file recursively throughout each prior commit tree in the chain of commits.You sure that's not what it means? – Gabriel Staples May 14 '20 at 23:19
  • To delete files with a given name: git ls-tree -r HEAD --name-only | grep "HelloWorld.pm" | xargs rm – tschumann Jun 10 '22 at 01:53
9
git ls-files | grep -i HelloWorld.pm

The grep -i makes grep case insensitive.

Bull
  • 701
  • 1
  • 6
  • 13
  • I think this is the best answer for sure. See my comments under the most-upvoted answer: https://stackoverflow.com/questions/277546/can-i-use-git-to-search-for-matching-filenames-in-a-repository/5681657#5681657 – Gabriel Staples May 13 '20 at 21:32
4

[It's a bit of comment abuse, I admit, but I can't comment yet and thought I would improve @uwe-geuder's answer.]

#!/bin/bash
#
#

# I'm using a fixed string here, not a regular expression, but you can easily
# use a regular expression by altering the call to grep below.
name="$1"

# Verify usage.
if [[ -z "$name" ]]
then
    echo "Usage: $(basename "$0") <file name>" 1>&2
    exit 100
fi  

# Search all revisions; get unique results.
while IFS= read rev
do
    # Find $name in $rev's tree and only use its path.
    grep -F -- "$name" \
        <(git ls-tree --full-tree -r "$rev" | awk '{ print $4 }')
done < \
    <(git rev-list --all) \
    | sort -u

Again, +1 to @uwe-geuder for a great answer.

If you're interested in the BASH itself:

Unless you're guaranteed of the word-splitting in a for loop (as when using an array like this: for item in "${array[@]}"), I highly recommend using while IFS= read var ; do ... ; done < <(command) when the command output you're looping over is separated by newlines (or read -d'' when output is separated by the null string $'\0'). While git rev-list --all is guaranteed to use 40-byte hexadecimal strings (without spaces), I never like to take chances. I can now easily change the command from git rev-list --all to any command that produces lines

I also recommend using built-in BASH mechanisms to inject input and filter output instead of temporary files.

Dean Hall
  • 627
  • 6
  • 6
  • 1
    Not sure why so much process substitution is being used, when you can simply pipe: `git rev-list --all | while read rev; do; git ls-tree --full-tree -r $rev | cut -c54- | fgrep -- "$name"; done | sort -u` – Simon Buchan Nov 24 '17 at 05:11
  • Script echos file, but not what revision it was found it. Useful to also echo `$rev` to show what revisions it's found in. – LB2 Jul 02 '20 at 20:17
  • @SimonBuchan There is an extra `;` char after the `do`. – Giorgos Kylafas Dec 19 '22 at 13:59
1

The script by Uwe Geuder (@uwe-geuder) is great but there really is no need to dump each of the ls-tree outputs in its own directory, unfiltered.

Much faster and using less storage: Run the grep on the output and then store it, as shown in this gist

dirkjot
  • 3,467
  • 1
  • 23
  • 17
  • gists can change, and it's better to include the code snippet in your answer anyway for convenience, especially when it's short. I recommend you copy the code snippet from the gist to your answer. Just leave the link to the gist is all to cite it as the source in case you ever update the gist but not this answer. – Gabriel Staples May 13 '20 at 21:16
  • Now that I look at your script closer, I see this is actually really useful. But, your answer needs 1) a title: `# How to find a long-lost file by searching all commits`, and 2) the code from the gist directly pasted into this answer. – Gabriel Staples May 13 '20 at 22:13
1

@Uwe-Geuder 's code can be made into a one-liner

git rev-list --all | xargs -I '{}' git ls-tree --full-tree -r '{}' | grep '.*HelloWorld\.pm$'. 

This also solves the hard-drive flooding issue.

subhacom
  • 868
  • 10
  • 24