remove extraneous characters from a filename

Question

I have been tasked a little above my head with taking a repository of files and removing excess garbage characters from the filename and saving the renamed file in a different directory folder.

An example of the filenames are:

100-expresstoll.pdf
1000-2012-09-29.jpg
10000-2014-01-15_14.03.22.jpg
10001-2014-01-15_19.05.24.jpg
10002-2014-01-15_21.30.23.jpg
10003-2014-01-16_07.33.54.jpg
10004-2014-01-16_13.33.21.jpg
10005-Feb 4, 2014.jpeg
10006-O'Reilly_Media,_Inc..pdf

First group of numbers at the beginning are record IDs and are to be retained along with the file's extension. Everything else between the record IDs and the file extension needs to be dropped.

For example, the final name for first three files would be:

100.pdf
1000.jpg
10000.jpg

I have read Removing characters and Rearranging filenames in addition to other postings, but the complexity of having a variable character length at the front, a variable number of intermediary characters to be removed and variable file extension types have really tossed this beyond my limited PowerShell reach.

Possible duplicate of [Use Regex / Powershell to rename files](http://stackoverflow.com/questions/5574648/use-regex-powershell-to-rename-files) — JamesQMurphy, Apr 02 '17 at 02:10

Don Cruickshank · Answer 1 · 2017-04-02T02:01:47.010

3

You can use the -replace operator to do this kind of string manipulation:

Get-ChildItem | foreach {

    $old_name = $_.FullName
    $new_name = $_.Name -replace '([0-9]+).*(\.[^.]*)$', '$1$2'

    Rename-Item $old_name $new_name
}

The regular expression is the trick here:

([0-9]+) means match a series of digits (1 or more digits)
.* means match anything
(\.[^.]*) means match a period followed by any characters other than a period
$ means that the match must reach the end of the string

The first and third are special in that they are surrounded by parentheses which means that you can use those values using the dollar notation (e.g. $1) in the replacement string.

edited Apr 02 '17 at 02:01

answered Apr 02 '17 at 01:46

Don Cruickshank

5,641
6
48
48

2

Using `$new_name = ($_.Name -replace '([0-9]+).*$', '$1') + $_.Extension` would also work without having the regex find the extension string. – lit Apr 02 '17 at 02:23
I'm well aware of the Extension property - thanks. The intent behind my answer is to demonstrate that the `-replace` operator can be used to solve string manipulations in general. – Don Cruickshank Apr 02 '17 at 13:05
1

It is not a criticism. your answer shows the use of regex to get everything. My not is just one more way to skin the cat. – lit Apr 02 '17 at 19:29
Indeed - I apologize if that came across as harsh. – Don Cruickshank Apr 03 '17 at 09:19

score 3 · Accepted Answer · answered Apr 02 '17 at 02:56

Another approach without regular expression. In both following examples is used risk mitigation parameter -WhatIf for debugging purposes.

Rename files:

Get-ChildItem -File | ForEach-Object {
    $oldFile = $_.FullName
    $newName = $_.BaseName.Split('-')[0] + $_.Extension
    if ($_.Name -ne $newName) {
        Rename-Item -Path $oldFile -NewName $newName -WhatIf
    }
}

Rename and move files:

$newDest = 'D:\test'                       ### change to fit your circumstances
Get-ChildItem -File | ForEach-Object {
    $oldFile = $_.FullName
    $newName = $_.BaseName.Split('-')[0] + $_.Extension
    $newFile = Join-Path -Path $newDest -ChildPath $newName
    if ( -not ( Test-Path -Path $newFile ) ) {
        Move-Item -Path $oldFile -Destination $newFile -WhatIf
    }
}

mklement0 · Answer 3 · 2017-04-02T03:12:20.457

Probably the most idiomatic way of solving this is as follows (assumes that all files of interest - and no others - are in the current dir.):

Get-ChildItem -File | Rename-Item -NewName { ($_.BaseName -split '-')[0] + $_.Extension }

Add common parameter -WhatIf to the Rename-Item command to preview the renaming operation.

Note that Rename-Item always renames items in their current location; to (also) move them, use Move-Item.

If a target with the same name already exists, Rename-Item reports a non-terminating error for each such case (without aborting overall processing).
Note that his could also happen if an input filename contains no -, as that would result in attempt to rename a file to itself.

Explanation:

Get-ChildItem -File outputs [System.IO.FileInfo] objects representing the files in the current directory, which are passed through the pipeline (|) to Rename-Item.
Passing a script block ({ ... }) to Rename-Item's -NewName parameter executes the contained code for each input object, where $_ represents the input object at hand.
- Note that this virtually undocumented but frequently used technique is called a script-block parameter [value], where a parameter that is designed to take pipeline input can be bound with a script block that processes the input indirectly.
($_.BaseName -split '-')[0] extracts the 1st --separated token from each input filename's base name (filename without extension).
+, because the LHS is a string, performs string concatenation.
$_.Extension extracts the filename extension from each input filename.

score 0 · Answer 4 · answered Apr 02 '17 at 02:10

I know this is not a PowerShell thing. If you just want something to work, this is a cmd batch file thing.

SETLOCAL ENABLEDELAYEDEXPANSION

SET "OLDDIR=C:\Users\lit\files"
SET "NEWDIR=C:\Users\lit\newdir"

FOR /F "usebackq tokens=*" %%a IN (`DIR /A:-D /B "%OLDDIR%\*"`) DO (
    FOR /F "usebackq delims=- tokens=1" %%b IN (`ECHO %%a`) DO (SET "BN=%%b")
    SET "EXT=%%~xa"
    ECHO COPY /Y "%OLDDIR%\%%~a" "%NEWDIR%\!BN!!EXT!"
)

remove extraneous characters from a filename

4 Answers4