29

I'm curious what exactly the behavior is on the following:

FileInfo info = new FileInfo("C:/testfile.txt.gz");
string ext = info.Extension;

Will this return ".txt.gz" or ".gz"?

What is the behavior with even more extensions, such as ".txt.gz.zip" or something like that?

EDIT:

To be clear, I've already tested this. I would like an explanation of the property.

Codeman
  • 12,157
  • 10
  • 53
  • 91
  • 14
    Did you run the code to see what it returns? Really really easy to do that since you already wrote the code. – Gromer Oct 02 '12 at 17:51
  • 2
    Of course, I would just like the explanation behind what `info.Extension` does exactly, so I can write my unit tests around those assumptions :) – Codeman Oct 02 '12 at 17:52
  • @Gromer Erwin below gave me what I was looking for - actual code behind, rather than just empirical tests. – Codeman Oct 02 '12 at 17:56
  • 1
    For those downvoting - I'm not asking this because I'm too lazy to write my own tests, I'm asking because I wanted to know what was happening behind the scenes. – Codeman Oct 02 '12 at 17:57
  • 3
    I hear ya. If you haven't used it before, ILSpy is a pretty nice tool to check out what various .Net methods do. If can be slow sometimes, but it's great for seeing how parts of the Framework are written. – Gromer Oct 02 '12 at 17:59
  • 1
    @Gromer I agree... also gives you a good insight into patterns and practices as well - so a good tool whatever you're doing. – Phil Cooper Feb 04 '14 at 11:31

4 Answers4

47

It will return .gz, but the explanation from MSDN (FileSystemInfo.Extension Property) isn't clear why:

"The Extension property returns the FileSystemInfo extension, including the period (.). For example, for a file c:\NewFile.txt, this property returns ".txt"."

So I looked up the code of the Extension property with reflector:

public string Extension
{
    get
    {
        int length = this.FullPath.Length;
        int startIndex = length;
        while (--startIndex >= 0)
        {
            char ch = this.FullPath[startIndex];
            if (ch == '.')
            {
                return this.FullPath.Substring(startIndex, length - startIndex);
            }
            if (((ch == Path.DirectorySeparatorChar) || (ch == Path.AltDirectorySeparatorChar)) || (ch == Path.VolumeSeparatorChar))
            {
                break;
            }
        }
        return string.Empty;
    }
}

It's check every char from the end of the filepath till it finds a dot, then a substring is returned from the dot to the end of the filepath.

Erwin
  • 4,757
  • 3
  • 31
  • 41
  • 1
    That's an implementation detail, not something you should rely upon. What you should rely upon is documentation. Implementation details are subject to change. –  Oct 02 '12 at 17:58
  • 1
    @hvd the documentation is not clear on the behavior in this case. From [MSDN](http://msdn.microsoft.com/en-us/library/system.io.filesysteminfo.extension.aspx): `The Extension property returns the FileSystemInfo extension, including the period (.). For example, for a file c:\NewFile.txt, this property returns ".txt".` – Codeman Oct 02 '12 at 18:02
  • @Pheonixblade9 I know, I'm trying to find documentation that does answer the question. :) –  Oct 02 '12 at 18:03
  • 1
    It seems perfectly clear to me. It returns the "extension". What's not clear? I don't understand the confusion. – Chris Dunaway Oct 02 '12 at 21:35
  • 4
    Extenssion may be "multiDotted", as in SaveFileDialog. – ephraim Oct 25 '17 at 05:48
10
[TestCase(@"C:/testfile.txt.gz", ".gz")]
[TestCase(@"C:/testfile.txt.gz.zip", ".zip")]
[TestCase(@"C:/testfile.txt.gz.SO.jpg", ".jpg")]
public void TestName(string fileName, string expected)
{
    FileInfo info = new FileInfo(fileName);
    string actual = info.Extension;
    Assert.AreEqual(actual, expected);
}

All pass

Johan Larsson
  • 17,112
  • 9
  • 74
  • 88
7

It returns the extension from the last dot, because it can't guess whether another part of the filename is part of the extension. In the case of testfile.txt.gz, you could argue that the extension is .txt.gz, but what about System.Data.dll? Should the extension be .Data.dll? Probably not... There's no way to guess, so the Extension property doesn't try to.

Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • 1
    Does Windows specify anywhere that extensions can contain a period? I've never seen that so I would assume that the extension is anything after the last period. – Chris Dunaway Oct 02 '12 at 21:37
2

The file extension starts at the last dot. Unfortunately, the documentation for FileSystemInfo.Extension doesn't answer that, but it logically must return the same value as Path.GetExtension, for which the documentation states:

Remarks

The extension of path is obtained by searching path for a period (.), starting with the last character in path and continuing toward the start of path. If a period is found before a DirectorySeparatorChar or AltDirectorySeparatorChar character, the returned string contains the period and the characters after it; otherwise, Empty is returned.

For a list of common I/O tasks, see Common I/O Tasks.

It would be nice there is an authoritative answer on file names in general, but I'm having trouble finding it.

  • "but it **logically *must*** return the same value" - Unless you've looked up the code for `Path.GetExtension()` how can you actually say that? And if you have, you should add that to your post. As it is, this is just an assumption on your side, and you know what they say about assumptions... – cogumel0 Jun 02 '16 at 11:17
  • 1
    @cogumel0 I specifically posted this answer because I *don't* want to look at the code. There was already an answer that looked at the code. If it turns out the code is wrong or suboptimal, a future update of .NET Framework could change the code. If there is an explicit promise in the documentation about what is and isn't a file extension, that's much less likely to change. And the remark in my answer here was the closest I could find. –  Jun 02 '16 at 11:48
  • I don't mind your remark nor the fact that you did not post code, in fact I love that you took that from MSDN and explained it rather thoroughly. However, the sentence which I highlighted above promises an irrefutable truth, but because you'd done nothing to prove that truth, I have to question it. What makes you say, beyond any reasonable doubt, that `FileSystemInfo.Extension` ***must*** return the same value as `Path.GetExtension`? I find nothing on MSDN to prove that to be the case and *if* you have no proof of this, unfortunately you just made an assumption that the two behave the same. – cogumel0 Jun 02 '16 at 15:00
  • 1
    In fact, even if you had looked at the code and realized that either they share the same code or even that `Path.GetExtension` just calls `FileSystemInfo.Extension` in the back end... like you said this could be changed at any given moments' notice, so without something in the documentation to show that, until further notice, this is the case, I don't think you're in a position to make such strong statements. In short, I love your answer, just take offence at the "logically must" part, I think a "should" is must better suited there. – cogumel0 Jun 02 '16 at 15:05
  • @cogumel0 Oh, now I see how you read my answer. That's not what I'm saying. By "logically must", I'm saying it would be illogical if `Path.GetExtension` and `FileSystemInfo.Extension` use a different definition of "extension". In order for those methods to be logical, they must agree on the definition. –  Jun 02 '16 at 15:10
  • I understand what you're trying to convey, but while it may be *logical*, it ***may*** not be true. It is entirely possible that one doesn't call the other and that the code of one gets updated while forgetting about the second. It may be possible that one of them has a bug, while the other doesn't, etc. I'm sort of just playing devil's advocate here and I am sure that you are right, but it is still just an assumption unless you have proof to back it up. – cogumel0 Jun 02 '16 at 15:19