2

I want to the get file's extension? such as:

import os
print os.path.splitext('testit.doc')
""">>>('testit', '.doc')"""

But it doesn't work when I use the following example.

import os
print os.path.splitext('testid.tar.gz')
""">>>('testit.tar', '.gz')"""

I see Chrome can rename files automantically when there is a file with the same name in location.It will add (1) or (n). I want to know how it does! Could any body tell me? enter image description here

ZZB
  • 277
  • 4
  • 21
  • 2
    How does it handle a filename like `'I.love.dots.some.extension'`? My guess is that it simply has a list of "multi-extensions"(like `.tar.bz2`, `.tar.gz`, etc.) and simply checks if it has to handle the filename in a special way. The other possibility would be to do `filename.split('.', 1)`, but this breaks with `I.love.dots` style filenames. – Bakuriu Mar 27 '13 at 11:56
  • possible duplicate of [Getting file extension using pattern matching in python](http://stackoverflow.com/questions/6525334/getting-file-extension-using-pattern-matching-in-python) – jamylak Mar 27 '13 at 12:03
  • I don't know how they do it, or if they actually have a very simple approach. Based on the first file (x.tar (1).tar.gz) it looks like they try examining for file extension twice. You can try test it if you create a file named "mytest.filename.gz" or "mytest.filename.rar" if the result is "mytest.filename (1).gz" or "mytest.filename (1).rar" then it looks like the have some known extension matching. Otherwise, they may have a simple "test for extension twice" approach :) – aweis Mar 27 '13 at 12:09
  • I test it by using "mytest.filename.gz" and get "mytest.filename(1).gz". Another, I test it by using "abc.abc" and got "abc(1).abc".It looks like known extension matching! – ZZB Mar 27 '13 at 12:26
  • @Bakuriu - you are correct. I just checked how it really does it. – zenpoy Mar 27 '13 at 12:41

2 Answers2

3

Luckily, chromium is open source so you can look into the well-documented code. Ok... I found it: here

RenameAndUniquify is:

void DownloadFileImpl::RenameAndUniquify(
    const base::FilePath& full_path,
    const RenameCompletionCallback& callback) {
  DCHECK(BrowserThread::CurrentlyOn(BrowserThread::FILE));

  base::FilePath new_path(full_path);

  int uniquifier =
      file_util::GetUniquePathNumber(new_path, FILE_PATH_LITERAL(""));
  if (uniquifier > 0) {
    new_path = new_path.InsertBeforeExtensionASCII(
        base::StringPrintf(" (%d)", uniquifier));
  }

...

}

and InsertBeforeExtension calls ExtensionSeperatorPosition which interests you (link):

// Find the position of the '.' that separates the extension from the rest
// of the file name. The position is relative to BaseName(), not value().
// This allows a second extension component of up to 4 characters when the
// rightmost extension component is a common double extension (gz, bz2, Z).
// For example, foo.tar.gz or foo.tar.Z would have extension components of
// '.tar.gz' and '.tar.Z' respectively. Returns npos if it can't find an
// extension.
StringType::size_type ExtensionSeparatorPosition(const StringType& path) {
  // Special case "." and ".."
  if (path == FilePath::kCurrentDirectory || path == FilePath::kParentDirectory)
    return StringType::npos;

  const StringType::size_type last_dot =
      path.rfind(FilePath::kExtensionSeparator);

  // No extension, or the extension is the whole filename.
  if (last_dot == StringType::npos || last_dot == 0U)
    return last_dot;

  const StringType::size_type penultimate_dot =
      path.rfind(FilePath::kExtensionSeparator, last_dot - 1);
  const StringType::size_type last_separator =
      path.find_last_of(FilePath::kSeparators, last_dot - 1,
                        arraysize(FilePath::kSeparators) - 1);

  if (penultimate_dot == StringType::npos ||
      (last_separator != StringType::npos &&
       penultimate_dot < last_separator)) {
    return last_dot;
  }

  for (size_t i = 0; i < arraysize(kCommonDoubleExtensions); ++i) {
    StringType extension(path, penultimate_dot + 1);
    if (LowerCaseEqualsASCII(extension, kCommonDoubleExtensions[i]))
      return penultimate_dot;
  }

  StringType extension(path, last_dot + 1);
  for (size_t i = 0; i < arraysize(kCommonDoubleExtensionSuffixes); ++i) {
    if (LowerCaseEqualsASCII(extension, kCommonDoubleExtensionSuffixes[i])) {
      if ((last_dot - penultimate_dot) <= 5U &&
          (last_dot - penultimate_dot) > 1U) {
        return penultimate_dot;
      }
    }
  }

  return last_dot;
}
zenpoy
  • 19,490
  • 9
  • 60
  • 87
0

I think it uses a list of well-known file extensions, You can do this too, You have many ways (Maybe with better performance than my solution, for example using a regex), but this is a very simple solution:

import os

known_extensions = ['.tar.gz', '.tar.bz2']
def splitext(file_name):
    file_name = file_name.strip()

    for ex in known_extensions:
        if file_name[-len(ex):] == ex:
            return file_name[:-len(ex)], ex

    return os.path.splitext(file_name)
MostafaR
  • 3,547
  • 1
  • 17
  • 24