4

I have set up a GitHub action that is supposed to change the commit it was started on. This looks something like

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0
      - name: Do some computations
        run: |
          # Create some new files that should be added to the commit
          dvc repro
      - name: Commit Results
        run: |
          git config --local user.email "action@github.com"
          git config --local user.name "GitHub Action"

          git add .
          git commit --amend --no-edit

          git push --force

This works fine, when there are no updates to the branch while the action is running. It becomes more complicated when I try to have multiple actions running with some small offset and try to use the following to combine the results:

- name: Commit Results
        run: |
          git config --local user.email "action@github.com"
          git config --local user.name "GitHub Action"

          CIREVISION=$(git rev-parse HEAD)

          # stash changes
          git add .
          git stash

          # update for new commit
          git pull --rebase

          # start rebase only on the commit of this CI run
          GIT_SEQUENCE_EDITOR="sed -i -re '0,/pick/{s/pick/edit/}'" git rebase -i "$CIREVISION^"

          # update files and commit
          git stash pop
          git add .
          git commit --amend --no-edit

          # finish rebase <- this fails with a merge conflict
          git rebase --continue

          git push --force

which in the case of a single changed file metrics.json fails with

Auto-merging metrics.json
CONFLICT (content): Merge conflict in metrics.json
Rebasing (2/3)
error: could not apply ABCD123... TEST01
hint: Resolve all conflicts manually, mark them as resolved with

I tried to use something like git status | grep both | awk '{print $3}' | xargs git checkout --ours . to not merge but always accept the version of the file present in the current branch, but I could not resolve the merge conflict.

A little more context:

I am using Github CI together with cml.dev to version control and allow for full reproducibility of research data. I update a parameter file, this triggers some computation and the results of these computations are saved externally, but some small metrics should be stored via git together with the correct parameters. Having one commit with the updated parameters but the old metrics is not feasable, therefore I must change the original commit.

I am aware of the consequences it has, to force push to a repository and I am willing to take the risk.

I usually use git for software development and have not looked into changing an existing git structure, because I usually don't want to do this.

PythonF
  • 456
  • 1
  • 5
  • 21
  • The problem seems due to running jobs in parallel. Wouldn't it be an option to run them in sequence? – GuiFalourd Nov 26 '21 at 11:56
  • It would require not running the jobs in sequence but I would have to block pushing to the branch while the action is running, because otherwise I would loose a commit through force push. – PythonF Nov 26 '21 at 12:04
  • You can eventually add a push step in a specific job at the end of all jobs execution (using the `needs` field). But even like this I think you will have the metrics file conflict (as many jobs will update the same file). Did you try this to check? – GuiFalourd Nov 26 '21 at 12:54
  • I have not yet - I will look into it though. Shouldn't it somehow be possible in each rebase to force accept the new file and not try to merge it? – PythonF Nov 26 '21 at 13:36
  • I'm not sure. From what I understand, as your jobs in parallel upload the same file almost at the same time, I think the `git pull --rebase` should maybe happen before the `git add .` to avoid conflict. But I'm not sure how it will behave in the workflow. – GuiFalourd Nov 26 '21 at 13:57
  • 1
    @PythonF cml has the `cml pr` command to be able to create a pr against the experiment branch. You could use `gh pr merge` if you would like to try to merge it automatically... not recommended either in my opinion – David G Ortega Nov 29 '21 at 15:16

1 Answers1

1

If all you need is to commit the current state regardless of upstream changes, then fetch and reset:

BRANCH=experiment
git fetch --all
git reset --mixed origin/$BRANCH
git add .
git commit -m "a new experiment"
git push origin HEAD:$BRANCH

Naturally the push may fail so the whole thing should be in a retry loop. This avoids the scariness of CI doing --force pushes.

casper.dcl
  • 13,035
  • 4
  • 31
  • 32