28

Using the following simple file:

using System;

public class Program{
        [STAThread]
        public static void Main(string[] args){
            Console.WriteLine("Boo");
        }
}

And then using the following command:

csc /target:exe /debug:pdbonly HelloWorld.cs

If you run this command and the PDB does not already exist then the PDB file size is 12KB. Otherwise, if the PDB file exists, then the new file size is 14KB.

Microsoft (R) Visual C# Compiler version 4.0.30319.17929
.NET 4.5

Anyone have any ideas what would explain this?

UPDATE:

  1. I do not experience this with .NET 3.5 and from the comments .NET 4 either.
  2. Using pdb2xml (http://blogs.msdn.com/b/jmstall/archive/2005/08/25/sample-pdb2xml.aspx), I cannot see any difference between the small and the larger one.
REA_ANDREW
  • 10,666
  • 8
  • 48
  • 71
  • 1
    You probably want to mention the version of the compiler you used. – leppie Mar 06 '13 at 15:22
  • You beat me to it :-) – REA_ANDREW Mar 06 '13 at 15:23
  • I use 4.0.30319.1, and I cannot reproduce the effect. Always 12 KB in size. – John Willemse Mar 06 '13 at 15:31
  • I am using .NET 4.5 (4.0.30319.17929) – REA_ANDREW Mar 06 '13 at 15:45
  • Maybe the 2KB are Informations of the Operating System due to the overwrite of the file. – Jehof Mar 08 '13 at 15:36
  • I'm not entirely able to reproduce this on a regular basis, but I managed to do a **[diff](http://pastebin.com/cRg6mkT0)** between the two files as I was able to reproduce this at least once. I assume it's just the order in which the components are being processed (multithreading?). Additional padding (block of 2KB?) will result in the file size change. See the raw view for best results. – Caramiriel Mar 08 '13 at 18:11
  • dia2dump returns the identical result, I put the dump here: http://www.heypasteit.com/clip/0Q1T – user287107 Mar 08 '13 at 23:54
  • 4
    PDB files use an undocumented file format ("MSF") that represents some kind of a virtual file system, so the size of the physical .PDB file doesn't represent the size of used data in it (you can have unused allocated pages that still take physical space). More here: https://code.google.com/p/pdbparser/wiki/MSF_Format – Simon Mourier Mar 10 '13 at 15:00
  • Is the version of the application set to automatically increase for each build? Maybe it is somehow linked? – rhughes Mar 11 '13 at 11:14
  • This is tested with 1 file, HelloWorld.cs. This is not a project, simply a file and the compiler. – REA_ANDREW Mar 11 '13 at 12:57

3 Answers3

18

My answer is simple, but maybe not so accurate. Let's use one debugger tool on our PDB files:

PDB

The only difference is PdbAge field. It means that PDB file is not recreated after each compilation! This file is modified, that's why it's size changes.

My guess is confirmed in this article. Quote:

One of the most important motivations for the change in format was to allow incremental linking of debug versions of programs, a change first introduced in Visual C++ version 2.0.

Another question is what exactly is changed in this file? Most detailed explanation of file format I have found in the book "Sven B. Schreiber, “Undocumented Windows 2000 Secrets: A Programmer’s Cookbook”". Key phrase is:

An even greater benefit of the PDB format becomes apparent when updating an existing PDB file. Inserting data into a file with a sequential structure usually means reshuffling large portions of the contents. The PDB file's random-access structure borrowed from file systems allows addition and deletion of data with minimal effort, just as files can be modified with ease on a file system media. Only the stream directory has to be reshuffled when a stream grows or shrinks across a page boundary. This important property facilitates incremental updating of PDB files.

He describe that not all data in file is useful in every moment. Some ranges of bytes are simply filled by zeros until that file will be modified during next compilation.

So I can't tell what exactly have been changed in PDB file except some GUID and Age number. You can go deeper after reading that book. Good luck!

UPDATE (15/03/2013):

I spent some more time to compare files. When i open them in HEX mode, i see the differences in header: Header Page size of file is 512 bytes (200h value at +20h) and page count is different: 120 and 124 (078h and 07Ch accordingly). On my screens the smaller file is on the left side. OK. The difference in file size is exactly 2048 bytes. It means that compiler adds 4 pages of data at the second time. Then I found all other differences. 3/4 of file from start contains small diffs - a few bytes as usual. But at point 2600h we see: Diff

Look! The line /LinkInfo./names./src/files/c:\Windows\microsoft.net\framework\v4.0.30319\helloworld.cs become cropped and now contains inconsistent information.

I look forward and found this line in second (bigger) file in full representation: Diff2 This information was placed to free space now (see zeros on the left side). I guess, an old pages (with corrupted string) were marked as unused space.

And at the end of file I've found exactly 2048 bytes of new information - all are zeros. Starting at 2E00h (11776 in decimal) and ending at 35F8h (13816 in decimal). And we remember, the size of first file was exactly 11776 bytes.

As a conclusion: I think the bigger file doesn't contain any new information. But I still can't answer why compiler added 4 empty pages of data to the end of ProgramDataBase file. I think this knowledge is a compiler's developers secret.

Anthony
  • 500
  • 3
  • 14
  • Yup. It is also easy to see by looking at the creation date timestamp of the .pdb file. – Hans Passant Mar 14 '13 at 22:23
  • Thanks for the answer, that looks like a useful tool that I have not seen before. I see a mention to V2.0 and the only thing is, the size stays the same in less than 4.5. I cannot seem to find any up to date information which explicitly states what change in the compiler would cause this. – REA_ANDREW Mar 15 '13 at 08:48
  • Even with this answer I am still confused as to why in the .NET 4.5 compiler this happens but does not in the earlier compilers which the links provided refer to. All the information referenced pre-dates the .NET 4.5 compiler and that is where I can see the problem, not in the earlier versions, hence why I find this strange behaviour. Please let me know if anyone disagrees and they feel this does answer why this is only present with the .NET 4.5 compiler and not earlier versions. – REA_ANDREW Mar 15 '13 at 15:49
  • @REA_ANDREW I've updated answer with a little bit more information. – Anthony Mar 15 '13 at 20:18
  • Cheers for the effort and the info. I was really hoping for an answer as to why this has made it into .NET 4.5 and its significance i.e. is it a bug and if not as to what purpose it serves. I honestly do not feel that this is "THE" answer but definitely a great set of information around the topic and I appreciate your time on this. If this is a .NET Compiler Developer's secret, like you say in your conclusion, then I feel it should be explained. Ultimately, whether or not this is a bug or not, this is different to previous frameworks and as such I believe it deserves some information. – REA_ANDREW Mar 16 '13 at 09:52
  • This is an internal implementation detail of the compiler, @rea. Why does it demand or deserve *any* explanation? Microsoft (and any vendor) is free to change the internal implementation of their tools at any time, as long as it doesn't break the public contract. I can't find a way that this does. There was no reason to ever assume that the PDB files would stay the same size between compiles. In fact, there's every reason to assume that they would *not*. I suspect this is the best explanation you'll get, from someone armed with a lot of time and who knows their way around a hex editor. – Cody Gray - on strike Mar 17 '13 at 06:32
  • I agree that the vendor is free to make non breaking changes at their discretion, but I feel it helps their product by informing their users about something which does not appear to have been present before. For example, if I want to now publish these to a symbol server now, should I publish the first or the second compile at the beginning. If there is no difference in the file, then why should it need to change at all. I think the information in this answer is great but I do not feel it was the answer. – REA_ANDREW Mar 17 '13 at 09:32
  • 1
    Also I am not assuming "anything!" I have made an observation which I cannot explain and have turned to StackOverflow for help in getting an explanation. If you read the question I am stating that the above has a repeatable compile size in .NET frameworks < .NET 4.5. My question was simply "What would explain this?" – REA_ANDREW Mar 17 '13 at 09:35
2

Simon Mourier's comment is almost certainly what's going on. On the second run of the compiler the PDB file gets updated, and the result of that updating leaves 'deleted' or unused blocks inside the PDB. On subsequent builds, instead of allocating new pages for the updates, the unused pages are reused (creating another set of unused pages in the process).

If there were a utility to 'garbage collect' the virtual filesystem, you'd likely end up with a 12KB file again.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
0

Each and every compile creates a new different assembly.

If you would like to take a dive deep into what exactly is different, then you might want to have a look at this article: "hacking with the clr: diffing assemblies".

Things that differ between compilations:

  • Timestamp
  • no-ops
  • ModuleDef GUID
  • Debug Attribute
  • Second Timestamp
  • PDB-GUID
  • Directory Difference
  • Several 4 Byte Offsets (DataDirectory.Debug, SizeOFData, AddressOfRawData, PointerToRawData, DataDirectory.MetaData)

I am not sure where the additional 2kb size difference between the first and second compilation come from. But I guess that there might be some information that is not included during the first time build but added on every subsequent compile.

Jens H
  • 4,590
  • 2
  • 25
  • 35
  • The assembly stays the same. This is the PDB file which is changing. – REA_ANDREW Mar 06 '13 at 16:00
  • @REA_ANDREW, the PDB is directly connected to its associated assembly and also gets re-compiled each time. For example, the original assembly gets a new GUID on every compile, therefore the PDB also gets a new GUID to keep both files in sync. So my above mentioned aspects refer to the PDB, too. More details on this and others can be found in the linked article. – Jens H Mar 06 '13 at 16:16
  • OK to be specific, the "size" of the assembly is staying the same but the size of the "pdb" is changing. I get what you are saying and I am looking into this myself whilst waiting/hoping for an actual answer. – REA_ANDREW Mar 06 '13 at 16:19
  • 8
    You're really just rephrasing the question though. "Why does the PDB change in size between first and second compile?" "Because the compiler changes its contents somehow". Yeah, but what are those changes? – jalf Mar 08 '13 at 18:36
  • I am listing a lot of changing contents in my answer, and the linked article gives explicit examples. I think I DID name the changing aspects here. – Jens H Mar 13 '13 at 13:56