0

My Linux system mounts some Samba shares, and some files are deposited by Windows users. The names of these files sometimes contain spaces and other undesirable characters. Changing these characters to hyphens - seems like a reasonable solution. Nothing else needs to be changed to handle these cleaned file names.

A couple of questions,

  • What other characters besides spaces, parenthesis should be translated?

  • What other file attributes (besides file type (file/dir) and permissions) should be checked?

  • Does Perl offer a pushd/popd equivalent, or is chdir a reasonable solution to traversing the directory tree?

This is my Perl program

#!/bin/env perl
use strict;
use warnings;

use File::Copy;

#rename files, map characters (not allowed) to allowed characters
#map [\s\(\)] to "-"

my $verbose = 2;
my $pat     = "[\\s\\(\\)]";

sub clean {
  my ($name) = @_;
  my $name2 = $name;
  $name2 =~ s/$pat/\-/g;

  #skip when unchanged, collision
  return $name if (($name eq $name2) || -e $name2);    #skip collisions

  print "r: $name\n" if ($verbose > 2);
  rename($name, $name2);

  $name2;
}

sub pDir {
  my ($obj) = @_;
  return             if (!-d $obj);
  return             if (!opendir(DIR, $obj));

  print "p: $obj/\n" if ($verbose > 2);
  chdir($obj);

  foreach my $dir (readdir DIR) {
    next if ($dir =~ /^\.\.?$/);    #skip ./, ../
    pDir(clean($dir));
  }
  close(DIR);
  chdir("..");
}

sub main {
  foreach my $argv (@ARGV) {
    print "$argv/\n" if ($verbose > 3);
    $argv = clean($argv);
    if (-d $argv) { pDir($argv); }
  }
}

&main();

These posts are related, but don't really address my questions,

Community
  • 1
  • 1
ChuckCottrill
  • 4,360
  • 2
  • 24
  • 42

2 Answers2

3

Here's a different way to think about the problem:

  1. Perl has a built-in rename function. You should use it.

  2. Create a data structure mapping old names to new names. Having this data will allow various sanity checks: for example, you don't want cleaned names stomping over existing files.

  3. Since you aren't processing the directories recursively, you can use glob to good advantage. No need to go through the hassles of opening directories, reading them, filtering out dot-dirs, etc.

  4. Don't invoke subroutines with a leading ampersand (search this issue for more details).

  5. Many Unix-like systems include a Perl-based rename command for quick-and-dirty renaming jobs. It's good to know about even if you don't use it for your current project.

Here's a rough outline:

use strict;
use warnings;

sub main {
    # Map the input arguments to oldname-newname pairs.
    my @renamings = 
        map { [$_, cleaned($_)] }
        map { -f $_ ? $_ : glob("$_/*")  }
        @_;

    # Sanity checks first.
    #   - New names should be unique.
    #   - New should not already exist.
    #   - ...

    # Then rename.
    for my $rnm (@renamings){
        my ($old, $new) = @$rnm;
        rename($old, $new) unless $new eq $old;
    }
}

sub cleaned {
    # Allowed characters: word characters, hyphens, periods, slashes.
    # Adjust as needed.
    my $f = shift;
    $f =~ s/[^\w\-\.\/]/-/g;
    return $f;
}

main(@ARGV);
FMc
  • 41,963
  • 13
  • 79
  • 132
  • Very nice code. Still: (1) error checking on `rename` sounds like a good idea. (2) I'm surprised you have backslashes but not periods as an allowed character: while a Windows filename will never include backslashes, this character is still very inconvenient to work with in bash. – amon Oct 16 '13 at 09:31
0

Don't blame Windows for your problems. Linux is much more lax, and the only character it prohibits from its file names is NUL.

It isn't clear exactly what you are asking. Did you publish your code for a critique, or are you having problems with it?

As for the specific questions you asked,

  • What other characters besides spaces, parenthesis should be translated?

    Windows allows any character in its filenames except for control characters from 0x00 to 0x1F and any of

    < > \ / * ? |

    DEL at 0x7F is fine.

    Within the ASCII set, that leaves

    ! # $ % & ' ( ) + , - . : ; = @ [ ] ^ _ ` { } ~ 

    The set of characters you need to translate depends on your reason for doing this. You may want to start by excluding non-ASCII characters, so your code should read something like

    $name2 =~ tr/\x21-\x7E/-/c

    which will change all non-ASCII characters, spaces and DEL to hyphens. Then you need to go ahead and fix all the ASCII characters that you consider undersirable.

  • What other file attributes (besides file type (file/dir) and permissions) should be checked?

    The answer to this has to be according to your purpose. If you are referring only to whether renaming a file or directory as required is possible, then I suggest that you just let rename itself tell you whether it succeeded. It will return a false value if the operation failed, and the reason will be in $!.

  • Does Perl offer a pushd/popd equivalent, or is chdir a reasonable solution to traversing the directory tree?

    If you want to work with that idiom, then you should take a look at File::pushd, which allows you to temporarily chdir to a new location. A popd is done implicitly at the end of the enclosing block.

I hope this helps. If you have any other specific questions then please make them known by editing your original post.

Borodin
  • 126,100
  • 9
  • 70
  • 144