0

I have 100+ filenames that are created from a program that have hidden special characters. In windows explorer, the filenames look correct, but copying and pasting the filename into a program such as notepad++ pastes with ?'s on either end. Ie, ?filename?. Renaming the filename manually by right clicking, deleting the filename, and retyping the filename fixes the problem. In order to see the extra characters, I have to switch the encoding in notepad++ from UTF-8 to ANSI. With help, I've identified the trailing '?' as id 65279, or a BOM. What is this char? 65279 ''

I need to load the files back into the program, but due to the hidden special characters, the program isn't seeing them to read in properly.

Is there a way to use PowerShell to scrub the files? Ideally, only the hidden special characters are removed, and the rest of the file name (including the underscores) is left alone. Filename collisions should not be an issue with the current situation, but an automatic overwrite would be a good solution if there was an exception. The output filenames are generated by a java script containing the following:

var objName = f[myCounter].contents.replace(/ /g,"_").toLowerCase();
app.pngExportPreferences.pageString = curPage.name;
var myFilePath = myDoc.filePath + "/" + objName + ".png"; //export to a folder of the current document
var myFile = new File(myFilePath);
myDoc.exportFile(ExportFormat.PNG_FORMAT, myFile, false);

In case the problem is easier to solve there. I am very new to PowerShell and javascript.

I've tried a few PowerShell scripts I've found, including:

dir -Recurse | ?{$_.Name -match $re}  | %{ren -literalpath $_.FullName -newname (join-path (get-item $_.PSPArentPath) $($_.Name -replace $re,""))}

gci *.png | Rename-Item -NewName {$_ -replace '_*(\[.*?\]|\(.*?\))_*' -replace '_+', ' '}

They didn't remove the hidden special characters.

Luxi
  • 3
  • 4

2 Answers2

0

To replicate a filename with the problem:

echo hi > ([char]65279 + 'hithere' + [char]65279 + '.txt')

Try this one. If it looks good, take off the -whatif after rename-item, so it actually takes effect.

dir | foreach {
  $name = $_.name
  $chars = [char[]]$name | where { $_ -in [char]' '..[char]'~' } # printable ascii
  $newname = -join $chars   # make a string again
  # $newname = $name -replace '[^ -~]'   # alternative
  if ($newname.length -lt $name.length) { # ascii name is smaller  
    $_ | rename-item -newname $newname -whatif
  }
}

Ref: http://facweb.cs.depaul.edu/sjost/it212/documents/ascii-pr.htm

js2010
  • 23,033
  • 6
  • 64
  • 66
  • Thank you for taking time out of you day for this. It half worked I think. When I paste the filename string after running it into notepad++ it now reads as 'filename?' instead of '?filename?'. In order to see the extra characters, I have to switch the encoding in n++ from UTF-8 to ANSI. – Luxi Aug 16 '19 at 22:49
  • Hmm, did you use the range 32..126? What happens if you copy and paste the weird character into powershell? Like `[int][char]'ó'` gives `243`. – js2010 Aug 16 '19 at 23:39
  • Yes, I used it exactly as provided. Pasting into powershell gives the same result as notepad++. If I use [int][char]'' on the offending question mark it gives code 65279. A little research shows that this is a BOM character. I'm looking more into it right now. https://stackoverflow.com/questions/6784799/what-is-this-char-65279 – Luxi Aug 17 '19 at 00:04
  • It works for me. I made a file like this: `echo hi > ([char]65279 + 'hithere')`. Then the script renamed it. Did you take the `-whatif` off rename-item? – js2010 Aug 17 '19 at 00:14
  • No, but that part seems to work fine (the leading 65279). I have no idea if this is right, but try: 'echo hi > ([char]65279 + 'hithere' + [char]65279 + '.txt')' instead as a test file. – Luxi Aug 17 '19 at 02:13
  • `echo hi > ([char]65279 + 'hithere' + [char]65279 + '.txt')` – Luxi Aug 17 '19 at 02:20
  • I confirmed that I get the same behavior from the above script on the test file generated above that I get from all the other files. The second 65279 remains but the first one is removed. – Luxi Aug 17 '19 at 02:24
  • 1
    I'm sorry, I misread your comment. I am now trying with the -whatif removed. It seems to work as you describe and I'm going to go try it on the large batch of files now. Thank you! I think this is the solution. – Luxi Aug 17 '19 at 02:27
  • 1
    Confirmed this works on a larger scale. Thank you, again. I spent roughly 14 hours just trying to figure out why the files weren't reading in. Today you have saved a workflow, a weekend, and most likely a keyboard from certain bashing. – Luxi Aug 17 '19 at 02:46
  • How did it happen? – js2010 Aug 17 '19 at 03:06
  • A script I used in an adobe product to output images from a series of pages names the files based on text fields. I'm going to see if I can figure out what the problem is there. The entire process is something I'm going to have to do several times, and this is a very workable solution vs having to manually rename every file every iteration. – Luxi Aug 17 '19 at 03:58
  • Sounds like each field is like a little utf8 encoded file with a BOM. – js2010 Aug 17 '19 at 05:48
0

The following script could help. Based on Character classes in regular expressions.

Regex updated to '\p{IsGeneralPunctuation}|\ufeff' after you identified the problematic character as U+FEFF Zero Width No-Break Space.
Should work for most file names, even for non-ascii ones (see Naming Conventions).

Get-ChildItem -Recurse -File |
    ForEach-Object {
        $strange = $_.Name
        $string  = $strange -creplace '\p{IsGeneralPunctuation}|\ufeff'
        if ( $strange.Length -ne $string.Length ) {
            'strange {0,3} {1}' -f $strange.Length, $strange
            'string  {0,3} {1}' -f $string.Length,  $string
            $_ | Rename-Item -NewName $string -WhatIf
        }
    }
JosefZ
  • 28,460
  • 5
  • 44
  • 83
  • Thank you for taking time out of you day for this. It runs fine and has the same issue as u/js2010 in that it looks to capture the leading special character but not the trailing one. When I paste the filename string after running it into notepad++ it now reads as 'filename?' instead of '?filename?'. In order to see the extra characters, I have to switch the encoding in n++ from UTF-8 to ANSI. – Luxi Aug 16 '19 at 23:21
  • Thank you @JosefZ, I don't know what -Whatif does, but just like @js2010's code, if I remove the -Whatif statement from your script it will correctly delete both of the problematic characters. If the -Whatif statement is included it will only delete the first of the problematic characters. This code generates a file representing the problem I encountered: `echo hi > ([char]65279 + 'hithere' + [char]65279 + '.txt')` Thank you very much for your time on this! – Luxi Aug 17 '19 at 04:06
  • **`Get-Help about_CommonParameters`**: In addition to the common parameters, many cmdlets offer the `WhatIf` and `Confirm` risk mitigation parameters. Cmdlets that involve risk to the system or to user data usually offer these parameters._ Read **˙Get-Help Rename-Item -online˙** or **`Get-Help Rename-Item -Parameter WhatIf`** as well: `-WhatIf []` Shows what would happen if the cmdlet runs. The cmdlet is not run. – JosefZ Aug 17 '19 at 09:59