0

Q1 : Does TFS gaurentee it's storage?

For instance, if I commit a file with content

printfn "%s" "fizz buzz"

will I get back a file with content

printfn "%s" "fizz buzz"

or will I get back file with content

printfn "%s" "fizz bUzz"

and an error or a way of identifying the content checked out was incorrect.

Git uses SHA-1 to do this kind of thing.

Q2 : What happens on a git checkout when the content of file in the .git store becomes corrupted, and it's sha pointer is now 'incorrect'? I have never tried, but what would happen at the git command line when the bits on the disk of a git repo get corrupted?

Q3 : Similarly, what would happen on a TFS checkout when the content of files in the TFS database become corrupted?

Vince
  • 3,497
  • 2
  • 19
  • 15
judek
  • 313
  • 2
  • 8
  • This should really be three separate questions - or at the very least two: Q1 and Q3 are related, at least, but Q2 has nothing to do with Q1 or Q3. – Edward Thomson Aug 19 '14 at 22:28
  • Thank you. The question is getting at how TFS can guarantee integrity. There is an old but extensively watched video here https://www.youtube.com/watch?v=4XpnKHJAok8, where L. Torvalds makes comments about 3 main things you would want from a source control. One of them is about guaranteeing the integrity of the data in source control. Git does this with SHA-1. Git makes a big point of this, but TFS does not seem to highlight this. Hg and Monotone also do the same thing. The question was getting at whether TFS does the same or similar thing. – judek Aug 20 '14 at 10:38
  • It sounds like it does. Hence Q1 seems to be answered. With respect to Q2, yes perhaps another question on Stackoverflow. – judek Aug 20 '14 at 10:39

1 Answers1

2

TFS uses MD5 for its checksum. When you upload a file as part of a changeset or shelveset, you also send the MD5 that you calculated. The server will also calculate the MD5 of your upload contents to validate that there was no corruption on the wire. Similarly, when you perform a Get from the server, it will deliver the MD5 of the content and clients will validate that the checksum matches.

As to your question about corruption, I can only speak hypothetically and only from having worked on TFS. Obviously if your database is corrupt, all bets are off and we make no guarantees as to anything.

When you say "a TFS checkout" (after having discussed "a git checkout" above it), I assume you mean "a TFS working folder mapping". The only problem should be when you try to do a get of the corrupted file. For example, if you had a corrupt $/Foo.cs at changeset 42 and you did a get of changeset 42, then it would fail. Your local version would not be updated on-disk or on the server and you would not be at version 42 locally.

If, however, there existed a $/Foo.cs at version 43 and it was not corrupt, then you should be able to get changeset 43 without problems. The server would not examine that previous version.

If you had gotten $/Foo.cs at changeset 42 before the database became corrupt, you should be able to check the file out (in the TFS terminology of "checkout", meaning to pend an edit), make changes, then check the file in, but there's certainly no guarantee of this. You would need to make changes, since if a client tries to produce a changeset or shelveset that includes a file with the MD5 of the current shelveset, the server will instruct the client not to bother sending the changes.

Again, this is all fairly hypothetical. We didn't exhaustively test data corruption scenarios. Realistically, if your database is corrupt, all bets are off and you should restore from a backup.

Edward Thomson
  • 74,857
  • 14
  • 158
  • 187
  • Thanks Edward. Your comments suggest MD5 checksums are used to check integrity of messaging at check-in and checkout. But, is the original MD5 checksum at check-in stored and compared with the MD5 checksum that is calculated when data is retrieved at checkout time? – judek Aug 20 '14 at 10:44
  • Yes, of course. The server delivers the content and the MD5 (which was calculated at checkin time by the client, validated by the server during the upload and stored in the database. – Edward Thomson Aug 20 '14 at 11:24
  • Thanks Edward. That answers a criticism of 'other' SC systems in the youtube.com/watch?v=4XpnKHJAok8. The video is May 2007, 7 years ago, but still, has widespread views. Regards – judek Aug 20 '14 at 12:06
  • I just realized that TFS applies MD5 to each file, while Git SHA1 to an entire commit. – Giulio Vian Aug 20 '14 at 13:06
  • Git uses SHA-id to access content. Git 'content' is addressable by a content-id which is the the SHA of the content. You access content by the checksums. The commit has an id which is the SHA, that points to a manifest (~dir-listing) with it's SHA and then you get to the file-content via it's SHA. See https://www.youtube.com/watch?v=ZDR433b0HJY. You can't change data wihtout changing the SHA-id you use to get to that data. – judek Aug 20 '14 at 14:25