I want to remove all the files in a branch that a specific user did not edit.
What's the most robust way to do that? I was hoping there would be a git command for it but I'm thinking I might have to write a program.
I want to remove all the files in a branch that a specific user did not edit.
What's the most robust way to do that? I was hoping there would be a git command for it but I'm thinking I might have to write a program.
disclaimer :
git
does not have a way to track the history of a single file, actions such as "this files was moved" can only be guessed after the facts by comparing the content of files in the historyso you may have issues linking a moved file to its correct author
depending on your intention regarding the notion of "edited by a user", you may want to use any combination of fields to spot "that's a commit on which he worked"
the list of "authors" and "committers" in your repo history may not accurately indicate who really edited those files.
You can do it in two steps:
Step 1: extract a list of files edited by the user
Here is one way to list all the files that appeared as "added" or "modified" by a given author in your repo :
$ git log --author="<NAME OR EMAIL>" --pretty="format:" \
--diff-filter=AM --name-only --all | sort -u
you can store that list of files on disk: $ git log --author... | sort -u > /tmp/authored.txt
Step 2: once you have the list of files to keep, you can use git filter-repo
to extract the part of the history that touches only these files
# work on a fresh clone of your repo:
git clone repo myclone
cd myclone
git filter-repo --paths-from-file /tmp/authored.txt
# the history of the 'myclone' now contains only files listed in /tmp/authored.txt
Further points:
as said in the "disclaimer" section, depending on what you intend with "edited", you may want to list more files in "Step 1":
git log --committer="<NAME>"
git log --grep="Co-Authored-by:.*[Ff]red"
note that you can run as many different git log
commands as you want to extract file names, you can always sort -u
the combined result in the end.
This solution does not try to be smart about renamings, please update your question if you have an explicit need for that.
try this
#!/bin/sh
# set the user name, only the first name, check the usernae by trying a git blame command
user="<username>"
# filter the files, which you need to check
file_filter=".java"
# get a list of all files in the branch
files=$(find . |grep $file_filter)
# loop through each file
for file in $files; do
if [ -f $file ]; then
if git blame "$file" >/dev/null 2>&1; then
# use git blame to determine the author of each line in the file
author=$(git blame $file | awk -v user="$user" '$2 ~ user {sub(/^./, "", $2); print $2}')
# echo $author
# if the user did not edit any lines in the file, remove it
if [ -z "$author" ]; then
echo "Not edited by user - $file"
git rm $file
else
echo "Edited by user - $file"
fi
fi
fi
done
If you need to delete the changed files only in this branch, you can do as below
#!/bin/sh
# set the user name, only the first name, check the usernae by trying a git blame command
user="<user>"
# if changed files only
current_branch=$(git rev-parse --abbrev-ref HEAD)
main_branbch="<main branch>"
merge_base_commit=$(git merge-base $current_branch $main_branbch)
files=$(git diff --name-only $merge_base_commit HEAD )
# loop through each file
for file in $files; do
# use git blame to determine the author of each line in the file
author=$(git blame $file | awk -v user="$user" '$2 ~ user {sub(/^./, "", $2); print $2}')
# if the user did not edit any lines in the file, remove it
if [ -z "$author" ]; then
echo "Not edited by user - $file"
git rm $file
else
echo "Edited by user - $file"
fi
done