0

We have a Drupal website that we maintain in a git repo. Because the site is a public site but is for a private audience, we want to set the robots.txt to disallow all.

However, Drupal makes core updates from time to time, and these contain a robots.txt in the root of the site repo, amongst many other files. The core update is provided through a tar.gz file, not through a remote of another repo. Updates to Drupal core, then, show up as diffs, just as if you had edited the code yourself.

If someone updates Drupal core in our repo, they might overwrite our custom robots.txt, and we probably would not notice until results started showing up in Google again.

Is there a way to "fix" or pin the state of a file in a git repo? Or at least make it noisy to someone who goes to commit changes to it?

user151841
  • 17,377
  • 29
  • 109
  • 171
  • 2
    http://stackoverflow.com/a/4710353/985949 – Mik378 Apr 20 '17 at 15:27
  • 1
    The accepted answer in the marked duplicate suggests using `--assume-unchanged`, but what you probably want is `--skip-worktree` which is intended for this kind of situations. – 1615903 Apr 21 '17 at 06:53
  • @1615903 still a little confused, this answer http://stackoverflow.com/a/13631525/151841 says "is useful when you instruct git not to touch a specific file ever because developers should change it." But developers shouldn't change it (they will likely do so accidentally, unpacking Drupal core)-- unless they are specifically intending to do so. – user151841 Apr 21 '17 at 14:02

2 Answers2

0

Have you considered renaming the other rpbots.txt to a different name on your branch? That way if the robots.txt file changes on the other branch and you merge, the changes would land on the renamed file and not your robots.txt

user151841
  • 17,377
  • 29
  • 109
  • 171
eftshift0
  • 26,375
  • 3
  • 36
  • 60
  • 1
    ...? But if the file is renamed to something other than ROBOTS.TXT then it won't do what ROBOTS.TXT does... – Mark Adelsberger Apr 20 '17 at 15:29
  • Then I'm not following. You want to get changes from another branch but _not_ the changes they do on robots.txt and you want _your_ robots.txt to remain the same no matter what they do on the other branch? Or you just want to get a conflict and resolve it if they change the upstream robots.txt? Cause if that's the case, then that's the way git works "out of the box". – eftshift0 Apr 20 '17 at 15:36
  • 1
    you might want to pay attention to usernames, because I'm not the one who asked the question. I'm just pointing out that what you recommend *won't do what OP said was desired*. – Mark Adelsberger Apr 20 '17 at 15:38
  • hahaha :-D ok. Let's see what the feedback is. Thanks for taking the time to read, just in case. – eftshift0 Apr 20 '17 at 15:46
  • Hrm. Well, I just reread what you're suggesting and I withdraw my original objection, replacing it with these two instead: (1) It puts *way* too much trust in git's rename detection; (2) as OP indicates that the updates are manually applied from a .tgz, it wouldn't work. – Mark Adelsberger Apr 20 '17 at 15:58
  • About (2), there's no problem with that. I would keep upstream code "as is" on a separate branch so that I can then merge them on my own downstream branch. About (1), _perhaps_. I've tried with other situations like this (not too complex) and it works fine but you are right in terms of it _relying_ on git's rename detection. – eftshift0 Apr 20 '17 at 16:01
  • Well, committing through a branch would at least help (though it isn't apparently OP's current workflow, so you might want to add that to your answer). I guess on the other point, I'll just leave it that it puts more trust in rename detection that would make *me* comfortable. – Mark Adelsberger Apr 20 '17 at 16:04
  • I've been spending a little bit of brainpower on this problem. i think the best thing they could do is manage upstream changes "as is" on their own branch and merge from there. That way whenever they merge from upstream they would get a conflict on robots.txt and they would be able to set it to _their_ version disregarding changes from upstream. – eftshift0 Apr 21 '17 at 20:42
  • This solution won't work in this. The repository is a Drupal site installation. Regularly there will be updates from Drupal, to the core. This occasionally includes `robots.txt`. I am probably not going to be able to convince the Drupal Organzation to change the name of the stock `robots.txt` in whats intended to be a web root directory. – user151841 May 09 '19 at 16:47
0

Updated to reflect new info about how updates are received:

You could use a hook. Since you want this to be globally enforced you'd probably want to use a server-side pre-receive hook to reject any push that modifies robots.txt. (You could use client-side pre-commit hooks to catch it earlier, but this would have to be set up correctly for each clone of the repo so isn't as good a guarantee.)

As to how exactly to write the hook, I'd start by looking at the examples in .git/hooks ; I think the pre-commit hook shows how to do a content-based reject.

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52