20

For some years now, I'm waiting for Subversion to feature a "delete permanently" (obliterate) function. I hesitate to make the transition to Subversion (coming from Visual SourceSafe :p), because I think this is an essential feature, as otherwise I'd expect the repository to grow unstopably. However, for one reason or the other, the feature gets postponed over and over again. So I begin wondering if there is some other feature or workaround which makes the obliterate function dispensable.

What do you do when you want to shrink the SVN central repository?

Example 1: I check in a large third party library, and after a few weeks I realize it is not suited for my needs. I don't want that to store and backup that large amount of data forever.

Example 2: I have 10 versions of 10 big third party libraries in the repository, but I only use the latest versions.

Example 3: I accidentally checked in sensitive information (as suggested by John).

Example 4: I accidentally checked in some big files that were never meant to be put in the repository.

Community
  • 1
  • 1
Dimitri C.
  • 21,861
  • 21
  • 85
  • 101
  • You could, of course, add this feature. Subversion is open source. They may not accept it as a contribution, but that doesn't mean you don't get the benefit. – T.J. Crowder Mar 11 '10 at 15:15
  • And what if you check in 10 versions of the Boost library and use only one? And you would like to make multiple backups of the repository? – Dimitri C. Mar 11 '10 at 15:16
  • @Crowder: You're right, I'll do that! :-) – Dimitri C. Mar 11 '10 at 15:18
  • I'd imagine duplicates can be merged somehow perhaps? – Mr. Boy Mar 11 '10 at 15:19
  • 1
    @Dimitri: I did say *could*, as opposed to *should*. Some good arguments here for why Subversion (and source control in general) doesn't/shouldn't have this feature. I'd say: Wait until you really, really need the feature before spending your time on it, you probably have more important things to do. :-) (Unless you're keen to learn the internals of Subversion, of course.) – T.J. Crowder Mar 11 '10 at 15:21
  • 7
    I don't think disk-space is an issue. Being able to remove files for other reasons is. – Mr. Boy Mar 11 '10 at 15:21
  • 4
    Imagine somebody accidentally checked in very confidential data. What can a subversion user do about that? Nothing! The Information will remain *forever* in your repository x-D – codymanix Mar 11 '10 at 15:22
  • The main reason for VSS having such a feature is its unreliability itself - it just cannot handle large repositories in a reliabile manner. Subversion does. – Marc Wittke Mar 11 '10 at 15:32
  • 2
    "Imagine somebody accidentally checked in very confidential data. What can a subversion user do about that? Nothing! The Information will remain forever in your repository x-D" Except that it won't.... there are ways to remove it. It's just not easy (nor should it be). – Eric H. Mar 11 '10 at 15:44
  • 1
    Being able to permanently delete something from VSS has been nothing but a nightmare for my team. You think it's safe to delete something that is sufficiently old, but Murphy's law mandates that as soon as you do so, that data will become vital. In VSS, it can also screw with the ability to pull the history of a file/folder. One of the main design points of Subversion is that it is fully auditable, and allowing a "permanent delete" command would violate that. For that reason, I doubt the Subversion project would accept a patch for an "obliterate" command. – bta Mar 11 '10 at 15:49
  • @codymanix: If that happens, the svn admin can revert the checkin, and the team can publicly humiliate the offending developer for not paying attention to what they were doing. Deleting is a major operation that has side effects; it is better if such power were reserved for the admin and not available to the common user. – bta Mar 11 '10 at 15:52
  • 1
    Disk space can be an issue, particularly if somebody checks in enough gigabytes of stuff by mistake, or if the repository has to be backed up remotely (in which case additional storage is potentially large additional time to back up). The other use case is confidential data being checked in by mistake. – David Thornley Mar 11 '10 at 15:53
  • 1
    It can still be an easy-to-use feature, only given to admin-level users. – Mr. Boy Mar 11 '10 at 15:53
  • 4
    @bta: Except that the sysadmin cannot revert the checkin, except by dumping, filtering, and restoring the repository. I don't want the `obliterate` feature available to the regular user, but it would be nice if the admin had it. – David Thornley Mar 11 '10 at 15:54
  • 8
    About your "What do you do when you want to shrink the SVN central repository?" Question... Do what my company did: Let the Repo Server have a HDD crash without backups. It's the best way to a small repo and follows the obliterate principle quite well. –  Mar 11 '10 at 20:39
  • 3
    Disk space may be relatively cheap, locally. But that space increases tremendously when your company needs to do backups on a weekly basis for disaster recovery. We have 4 copies plus 7 days a week backup. When the 3D team accidentally checks in a large 3D model, it breaks our ability to backup the repository permanently! – TamusJRoyce Feb 06 '12 at 04:18

13 Answers13

18

There is a fair amount of discussion of svn obliterate on the problem ticket at the Apache Subversion site, most of it ending about 2008. There seems to be general agreement that it's a good capability to have, although its use should be rare.

There are two main reasons to want it.

First, checking in confidential information can be a problem. Leaving it in there, deleted, is not necessarily an option, depending on the level of confidentiality and exposure of the repository.

Second, checking in a large amount of stuff that shouldn't be checked in can drastically increase the size of the repository. Disk space is generally cheap nowadays, but it isn't unlimited, and there are other ways file space can matter. If it's necessary to send a repository over a net connection, that's extra time which may or may not be important. There can be real advantages to being able to burn a CD-ROM or DVD-ROM that contains the whole repository.

Therefore, it's a useful capability which is currently done by dumping, filtering, and reloading the repository. This is error-prone according to reports I've seen, can be slow, and requires shutting down the repository.

Obviously, it's not a high-priority feature for the Subversion team, given that what it's needed for quite a few years is somebody to do the work to come up with a design and implement it. After all, it should be done very rarely, and there is a workaround. However, anybody who wants to do a whole lot of work on Subversion could provide a patch that would (if good enough quality) probably be implemented.

Community
  • 1
  • 1
David Thornley
  • 56,304
  • 9
  • 91
  • 158
  • 2
    If you think of the feature as `undo` rather than `obliterate`, use-cases become more obvious; you only need one inexperienced user to commit several 100Mb of .obj/.pdb files. – Mr. Boy Feb 10 '12 at 16:48
  • 1
    Offsite backups are also a problem. One idiot that checks in some crazy sized file and you suddenly don't make your backupwindow anymore – Marco van de Voort Nov 18 '12 at 14:33
11

It violates the meaning of source control.
Source control is all about being able to restore a previous state. If you delete a file permanently you won't be able to.

OTOH i do not know VSS so i might have misunderstood "delete permanently"

  • 1
    Eventually, you reach a point where you don't want to go back that far anymore; at that point, the previous data is pure chaff and can (arguably should) be tossed. Agreed you don't reach that point quickly, or take the decision lightly. – T.J. Crowder Mar 11 '10 at 15:16
  • 9
    What if you accidentally commit some personal data? like letting all the devs see your evaluation comments on each other or salaries? That can be illegal in some cases, what do you do? – Mr. Boy Mar 11 '10 at 15:18
  • 1
    +1 Delete permanently is not something source control should do. I'd go further and argue that if system X allows permanent deletion then system X is not a source control system. The increase in disk space used by the repository is, furthermore, one of the weakest arguments in favour of permanent deletion – High Performance Mark Mar 11 '10 at 15:20
  • 2
    @John: Ask the admins to remove it (which is possible and not that hard). This should be the exception, not the normal case. If you fail to use source control correctly it's not the fault of the software. –  Mar 11 '10 at 15:20
  • How do they do that? By what I've heard this is a horrible task. – Mr. Boy Mar 11 '10 at 15:22
  • 1
    SVN is a tool. It doesn't tell me how to work, at least it shouldn't. You don't like people forcing you to do things their way in real life, do you, when they think they know best? – Mr. Boy Mar 11 '10 at 15:23
  • 1
    It's more effort than simply doing "right-click => obliterate" but making mistakes like that _has to_ hurt so you won't make it again (and next time maybe don't even notice it or only delete half of the data). Still, last time i had to do that about 2 years ago it took me... about 3.5 minutes. –  Mar 11 '10 at 15:24
  • @John: It's a simple question of "does it make sense". Does it make sense to spend all the developer hours for a feature that solves a problem that shouldn't exist? –  Mar 11 '10 at 15:26
  • @dbemerlin: So you're saying that in effect, the feature exists. It's admin-level (as it should be), but it's there. – T.J. Crowder Mar 11 '10 at 15:27
  • 2
    no. You have to manually rip the repo apart based on what I was told. – Mr. Boy Mar 11 '10 at 15:37
  • 2
    @dbmerlin... if you design software that deliberately hurts people to force them to use the software as you want, you've got problems – Mr. Boy Mar 11 '10 at 15:38
  • @John: fair to say, but if a user doesn't like a piece of software because it deliberately hurts him, then he is free not to use it. Which is exactly what the OP does, and I think it's only fair to let the software makers decide all by themselves whether or not they've got problems by not having someone as a user. – ЯegDwight Mar 11 '10 at 16:26
  • 1
    Another approach toward sensitive data is to ACL the file out of visibility. – Yuliy Mar 11 '10 at 23:24
  • 3
    This answer is only a part of the full story. Real-world SVN usage has valid use-cases for obliterate that must be considered. – usr Oct 23 '12 at 18:54
8

The obvious reason against it is because the developers think it will on balance make SVN worse - the happiness you feel at being able to prune un-needed stuff will be vastly dwarfed by your anger when you accidentally obliterate something and your /trunk goes missing.

FogBugz has exactly the same behavior, and in their case it's entirely by design I believe, protecting users from themselves.

Mr. Boy
  • 60,845
  • 93
  • 320
  • 589
7

Quoting Subversion Obliterate, the forgotten feature, there are three components to the question, the problem, the reason and the solution. Since you started with the question to the solution, I'll start with that.

Solution

As you noticed, there is no great solution. Especially if you are dealing with a big corporate repository, since the solution becomes harder the bigger the repo gets. There's a feature called dump / filter through which you can clean out your repo of stuff you don't want, but it is not that easy to use, not fast and not reliant.

There has been a small effort (follow the thread) on the svn team to get an obliterate feature in there after 2008, but the effort died a silent death.

The problem

The article I mentioned at the start actually has a good list of use cases where one would need an obliterate command and in the 516 issue thread the developers actually acknowledged its merit.

Alas, it seems too late for that now; the real reason it was never added later, was that it now nigh impossible to implement it, as it hooks into the code at the most fundamental level (also see small effort link under Solution).

From the FAQ entry:

Revisions are immutable trees which build upon one another. Removing a revision from history would cause a domino effect, creating chaos in all subsequent revisions and possibly invalidating all working copies.

The reason

The problem is that originally the obliterate feature was dismissed as it was not conform the principle of true version control.

Again from the FAQ entry:

How do I completely remove a file from the repository’s history? There are special cases where you might want to destroy all evidence of a file or commit. (Perhaps somebody accidentally committed a confidential document.) This isn’t so easy, because Subversion is deliberately designed to never lose information.

However

I've worked with SVN for a lot of clients now with larger teams and larger project and basically never had a real issue. Yes the use cases mentioned warrant an obliterate feature, but so far I'm not convinced that this is a problem that you have over and over again everywhere you go. Ofcourse, the nature of this particular problem is that you only have to make a mistake once and it can't be undone properly.

Community
  • 1
  • 1
Benny Bottema
  • 11,111
  • 10
  • 71
  • 96
7

Obliterate violates the version control principles that you'd want to have. Either you wouldn't save any space, or previous tags would become broken. You would not be able to go back to a true previous version if you had obliterated any files.

As for your comment about the repository growing... Any repository will grow linearly with the size of changes over time. That's the whole point of a source control system. If you don't need to be able to track prior versions, then why not just stick to a shared folder somewhere?

Yuliy
  • 17,381
  • 6
  • 41
  • 47
  • 5
    oh yes, return to the golden days of .old, .older, .oldest, .bak, .backup, .deleted, .obsolete and ~. Those were the days... feels like yesterday. I still have nightmares... –  Mar 11 '10 at 15:18
5

It is possible to reduce the size of a SVN repository by doing a dump and load. Essentially if you say that you never want to revert to something more than a couple years old it is possible to dump the repository, filter based on time, then reload the dump. Wanting to get rid of a single file due to size is probably an indication that the file didn't really belong in a source control system in the first place.

tloach
  • 8,009
  • 1
  • 33
  • 44
  • 1
    On that note, why are you checking third-party libraries into your repository? If you absolutely must keep them in your system, have a separate repository for third-party libs and use `externals` to link them into your source tree. – bta Mar 11 '10 at 16:12
4

Because removing data from the repository breaks the basic premise of source control, that being that it is possible to reproduce all previous states and changes to the source tree. If you want to obliterate something from version control, you're probably "Doing It Wrong", as they say.

Sparr
  • 7,489
  • 31
  • 48
4

There is some scripting which helps you obliterate data. Follow this mailing list thread for more info.

It's a hard way to do it as the essence of version control is not losing data, as opposed to deleting it permanently. But if you prune once a year or something like that it can be done.

extraneon
  • 23,575
  • 2
  • 47
  • 51
  • Can you give an estimation about how long such an operation typically takes? – Dimitri C. Mar 11 '10 at 15:44
  • Not really, but it involves modifying the tables, and then dumping and reading the repository. So it won't be fast but might possibly be automated. – extraneon Mar 12 '10 at 09:00
3

The entire point of source control is to have a complete history of what your repository looks like. The obliterate command defeats this purpose of source control, and it's a misfeature in all version control systems that have it.

SVN has cheap copying and cheap branching that doesn't require a full copy of the file--just the changed bits. Its central repository is usually very manageable in size, making this misfeature unnecessary.

JSBձոգչ
  • 40,684
  • 18
  • 101
  • 169
  • 1
    On the other hand, what about the argument that "It's my damn repository, I should be able to do what I want"? _Should_ the software be able to decide for you? – Mr. Boy Mar 11 '10 at 15:21
  • 2
    @John: but it most likely isn't your own repository, you are sharing it with other people who rely on the fact that revision X is really revision X. Nothing stops you from forking SVN and giving it another name, but people will probably prefer the "safe" version. – Otto Allmendinger Mar 11 '10 at 15:44
  • 1
    The other people will do what I tell them ;) – Mr. Boy Mar 11 '10 at 15:54
  • You can always obliterate a file by logging in as an admin on the svn server and doing shenanigans in the repository itself. – JSBձոգչ Mar 11 '10 at 22:37
  • 1
    Hardly a good approach though. I know open-source people don't like good UI, but having to take down the repo and rip it apart is taking it a bit far :) – Mr. Boy Mar 12 '10 at 09:54
3

I use various version control systems for about 15 years now and never needed a feature like this.

I wonder what the reasons are that you want that feature:

  • disc space? Hard to believe considering the price of disc space
  • commited a password to version control? Well that will teach you. Go and change the password
  • speed of the repository? Doesn't sound so, but if I would consider a completely different system with supposedly better performance.
Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
  • 5
    o Committed your entire financial records to an open-source system? – Mr. Boy Mar 11 '10 at 15:24
  • @John: "Oops" wouldn't begin to say it... ;-) – T.J. Crowder Mar 11 '10 at 15:28
  • @Matthew: People do make mistakes. – Dimitri C. Mar 11 '10 at 16:06
  • 2
    Especially when a non-programmer uses SVN. They often struggle to grasp the concepts even using visual tools, and commit all kinds of rubbish. – Mr. Boy Mar 11 '10 at 17:29
  • 2
    I know people make mistakes. That wouldn't stop me from using my right to free speech in the form of sarcasm and calling them a moron either. (I would even call myself a moron if I did something like this. And I do every time I accidently check in a password.) – Matthew Whited Mar 11 '10 at 17:45
  • You _don't get_ free speech on SO. My comment got deleted on another post because I said it was dumb someone assumed I was running Linux. – Mr. Boy Mar 12 '10 at 09:53
2

Last I checked it was intended as an ADMIN feature, and the admin can already dump/filter/broken_workaround and remove history anyway. In regard to the audit trail, this doesn't change the current citation. It would make it less horrible, if something absolutely must removed.

Svnadmin obliterate is one of the most requested features, the dev's finally admit it should exist (finally! after 8 years!!!). And the publicity of it not existing, is chasing users away from SVN.

Unfortunately i had to learn about this "missing feature" the hard way. Since when is basic functionality a feature? New users are starting to hear about this and avoid SVN. As for me, I now use Git.

Don't like my opinion? Linus referred to the SVN developers as morons, and the whole centralized system flawed. I trust Linus as a true expert, and specifically He knows about source.

J. M. Becker
  • 2,755
  • 30
  • 32
1

Obliterate is not an essential feature of Subversion, because it actually breaks the basic principles of version control (which is: to record all history).

And it isn't an essential feature because there are workaround to get this done anyway (using svnadmin and filtering).

Also, the feature is currently heavily worked on. See this post for details.

Stefan
  • 43,293
  • 10
  • 75
  • 117
0

What I do - not use subversion. Sorry.

They (the developers) obvoiously don't agree with your assessment of that being a critical feature. Did not stop the company I work at at the moment to use it ;) I personaly rule out subversion for this exact reason.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
TomTom
  • 61,059
  • 10
  • 88
  • 148
  • 2
    Is your employer now regretting the decision because it has to spend so much of its budget on disks now? I mean, how big an effect does this feature's absence really have? – Rob Kennedy Mar 11 '10 at 15:24
  • 1
    Maybe a stupid question, but which system != VSS supports obliterate? I wouldn't be surprised that the "wisdom of the crowds" shows, that this is no critical feature... – Marc Wittke Mar 11 '10 at 15:29
  • Been using SourceGear Vault for years now;) It really dpeends what you do - if your source control contains a lot of binary data.... as ours sometimes does.... things get nasty fast (missing delta). Not everone deals only with code. I know one time we added around 300mb per day into the source control system. – TomTom Mar 11 '10 at 15:53
  • 1
    @Mark Wittke: CVS does, IIRC as an admin feature. – David Thornley Mar 11 '10 at 16:02
  • 1
    @TomTom: As far as I know Subversion supports binary diffs. Is obliterate in case of SourceGear a method to avoid problems because of the inferiority of the software? – Mnementh Mar 11 '10 at 16:17
  • Hardly given that Sourcegear has a top end data store behind it. – TomTom Mar 11 '10 at 17:16
  • @MarcWittke: Git lets you completely obliterate data (usually using `git rebase`, though there are various ways to do it). – sleske Dec 05 '12 at 00:43