1

I'm storing JSON result files from benchmark tests into a specific folder that is meant to hold up to X number of files, once that folder hits X number of JSON files it should remove the least recently added JSON file (oldest file) to have a maximum of X number of JSON files in the folder at one time.

I've currently implemented a solution similar to the accepted answer from this SO post. The problem is that the ls is not returning the files with the modification time that I expect.

When I run the following find command I see that the modified time are all very close to each other. I did not change the files so it must have something to do with the git pull that I ran and note that I'll be running this on Jenkins so it will create a new workspace every build.

$ find -type f -printf '%T+ %p\n' | sort
2021-08-03+10:49:13.8291325000 ./benchmark-result-4.7.2-10.json
2021-08-03+10:49:13.8391335000 ./benchmark-result-4.7.2-11.json
2021-08-03+10:49:13.8481332000 ./benchmark-result-4.7.2-12.json
2021-08-03+10:49:13.8591340000 ./benchmark-result-4.7.2-3.json
2021-08-03+10:49:13.8681350000 ./benchmark-result-4.7.2-4.json
2021-08-03+10:49:13.8751338000 ./benchmark-result-4.7.2-5.json
2021-08-03+10:49:13.8811401000 ./benchmark-result-4.7.2-6.json
2021-08-03+10:49:13.8891411000 ./benchmark-result-4.7.2-7.json

But I'd expect the order to be the following, because the first one committed was the 4.7.2-3 result file.

./benchmark-result-4.7.2-3.json
./benchmark-result-4.7.2-4.json
./benchmark-result-4.7.2-5.json
./benchmark-result-4.7.2-6.json
./benchmark-result-4.7.2-7.json
./benchmark-result-4.7.2-10.json
./benchmark-result-4.7.2-11.json
./benchmark-result-4.7.2-12.json

I've tried this command

$ git log --no-merges --first-parent --name-only --diff-filter=A --pretty=format: <branch_name> <directory_to_delete_from> | grep ".json"
benchmark-result-4.7.2-13.json
benchmark-result-4.7.2-12.json
benchmark-result-4.7.2-11.json
benchmark-result-4.7.2-10.json
benchmark-result-4.7.2-9.json
benchmark-result-4.7.2-8.json
benchmark-result-4.7.2-7.json
benchmark-result-4.7.2-6.json
benchmark-result-4.7.2-5.json
benchmark-result-4.7.2-4.json
benchmark-result-4.7.2-3.json
---------------------------- <- manually inserted
benchmark-result-4.7.2-17.json
benchmark-result-4.7.2-14.json
benchmark-result-4.7.2-13.json
benchmark-result-4.7.2-12.json
benchmark-result-4.7.2-11.json
benchmark-result-4.7.2-10.json
benchmark-result-4.7.2-9.json
benchmark-result-4.7.2-8.json
benchmark-result-4.7.2-7.json
benchmark-result-4.7.2-6.json
benchmark-result-4.7.2-5.json
benchmark-result-4.7.2-4.json
benchmark-result-4.7.2-3.json

While it does give me an ordered list by commit time, all of the benchmark results below the "----" do not exist within the directory anymore since they were previously deleted. I could reverse the list and then delete from the top until we only have 10 left BUT I don't love that solution as the is an opportunity that the benchmark result files have the same name.

Is there a way that I can alter the above git log to instead only return files that exist within the repository still? Or is there another command that can solve my problem?

Note that the solution can be a bash script and does not need to exist within a single line.

terrabl
  • 743
  • 3
  • 8
  • 23
  • what do you think a "git commit time" is? – jhnc Aug 03 '21 at 16:49
  • "git commit time" to me would be the datetime of the commit. The newest files added to the folder would have the most recent commit time. – terrabl Aug 03 '21 at 16:59
  • `git log` does default to presenting in reverse chronological order, but the timestamp of any actual file created by a `pull` doesn't have to match the commit time (and probably won't). – jhnc Aug 03 '21 at 17:12

1 Answers1

0

It looks like you've named them with numeric sequence numbers, to sort in that sequence you're looking for sort -V. git ls-files \*.json | sort -V.

If you can't rely on the names, you have to postprocess the logs for the latest:

git log --pretty= --name-status -- \*.json \
| awk -F$'\t' '!seen[$2]++ && $1!="D"'

and with a long history you'll want to add some "am I done yet?" logic.

jthill
  • 55,082
  • 5
  • 77
  • 137