1

I have a .csv file that I'd like to sort pre-commit. I found this pre-commit hook file-contents-sorter which sorts the file with the first value.

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.1.0
    hooks:
       - id: file-contents-sorter
         files: blackbox-files\.csv$

But I'd like to sort it first by the last value and then the first value. I was wondering if I can do something like this with pre-commit git hooks.

My csv file is something like this:

aaaa,bbbb
sssss,bbbb
fff,bbb
kkkk,eeee
www,ddd

The above code gives me this output:

aaaa,bbbb
fff,bbb
kkkk,eeee
sssss,bbbb
www,ddd

I'd like this output:

aaaa,bbbb
fff,bbb
sssss,bbbb
www,ddd
kkkk,eeee
anthony sottile
  • 61,815
  • 15
  • 148
  • 207
abidishajia
  • 222
  • 1
  • 6
  • 15
  • you'll need to write a tool to do your specific sort, `file-contents-sorter` is intentionally simplistic and does not implement a csv parser. asking for tool suggestions on SO is off topic as it tends to invite spammy/advertisery answers – anthony sottile Mar 22 '22 at 23:40

2 Answers2

0

Based on a simple hack and sort's ability to sort by arbitrary columns, here's how you can do this using GNU sort:

  - id: sort
    name: Sort records
    entry: bash
    args:
      [
        -c,
        'for index in $(seq 0 "$#"); do LC_COLLATE=C.UTF-8 sort
        --field-separator=, --key=2,2 --key=1,1
        --output="${!index}" "${!index}"; done',
      ]
    files: blackbox-files\.csv$
    language: system

Basically, since sort can only sort one file at a time in-place we have to loop over the entire parameter array (not $@, because that doesn't include $0). LC_COLLATE sets the locale for collation or sorting, to make sure the sorting is consistent on different machines. And most of the sort arguments are there to sort by the second and then the first comma-separated field.

l0b0
  • 55,365
  • 30
  • 138
  • 223
-1

There's an open question here, you've got bbb sorting both before and after bbbb, simplest explanation for that is the first field overrides the second, but you've also got kkkk,eeee sorting after www,ddd, simplest explanation for that is the second field overrides the first. Maybe I'm just blindspotting here, and it's kind of a side issue so I'm going to just make something up that produces the results you gave and go with it. The sort keys I use do that but might not do what you want all the time, check those.

The normal-ish way to clean up a file on the way in to the repo is with a filter. To get the results you show

git config filter.blackbox.clean 'sort -k2.1,2.1 -k1 -t,'
git config filter.blackbox.smudge cat

and

echo blackbox-files.txt filter=blackbox >>.gitattributes

then the file will be filtered on add. The filter specs themselves have to be (re)configured in any repo that uses them, this is a gross infelicity forced by the need to not invite code-injection attacks, sorry but ILOVEYOU was not fun. Make a note in your README, or add a couple lines in your setup recipe, however you do your new-repo onboarding this is another annoying step.

jthill
  • 55,082
  • 5
  • 77
  • 137