2

I have a few hundered PDF files that have text in their file names which need to be removed. Each of the file names have several underscores in their names depending on how long the file name is. My goal is to remove the text in that exists between the .pdf file extension and the last _.

For example I have:

  • AB_NAME_NAME_NAME_NAME_DS_123_EN_6.pdf
  • AC_NAME_NAME_NAME_DS_321_EN_10.pdf
  • AD_NAME_NAME_DS_321_EN_101.pdf

And would like the bold part to be removed to become:

  • AB_NAME_NAME_NAME_NAME_DS_123_EN.pdf
  • AC_NAME_NAME_NAME_DS_321_EN.pdf
  • AD_NAME_NAME_DS_321_EN.pdf

I am a novice at powershell but I have done some research and have found Powershell - Rename filename by removing the last few characters question helpful but it doesnt get me exactly what I need because I cannot hardcode the length of characters to be removed because they may different lengths (2-4)

Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName {$_.name.substring(0,$_.BaseName.length-3) + $_.Extension}

It seems like there may be a way to do this using .split or regex but I was not able to find a solution. Thanks.

crazymatt
  • 3,266
  • 1
  • 25
  • 41

3 Answers3

3

You can use the LastIndexOf() method of the [string] class to get the index of the last instance of a character. In your case this should do it:

Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName { $_.BaseName.substring(0,$_.BaseName.lastindexof('_')) + $_.Extension }
TheMadTechnician
  • 34,906
  • 3
  • 42
  • 56
1

Using the -replace operator with a regex enables a concise solution:

Get-ChildItem 'C:\Path\here' -Filter *.pdf | 
  Rename-Item -NewName { $_.Name -replace '_[^_]+(?=\.)' } -WhatIf

-WhatIf previews the renaming operation. Remove it to perform actual renaming.

  • _[^_]+ matches a _ character followed by one or more non-_ characters ([^-])

    • If you wanted to match more specifically by (decimal) digits only (\d), use _\d+ instead.
  • (?=\.) is a look-ahead assertion ((?=...)) that matches a literal . (\.), i.e., the start of the filename extension without including it in the match.

  • By not providing a replacement operand to -replace, it is implicitly the empty string that replaces what was matched, which effectively removes the last _-prefixed token before the filename extension.


You can make the regex more robust by also handling file names with "double" extensions; e.g., the above solution would replace filename a_bc.d_ef.pdf with a.c.pdf, i.e., perform two replacements. To prevent that, use the following regex instead:

$_.Name -replace '_[^_]+(?=\.[^.]+$)'

The look-ahead assertion now ensures that only the last extension matches: a literal . (\.) followed by one or more (+) characters other than literal . ([^.], a negated character set ([^...])) at the end of the string ($).

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • While this meets OPs expressed requirements a more safe RE would IMO be `Rename-Item -NewName { $_.Name -replace '_[^_]+(?=\.pdf$)' } -WhatIf` or `Rename-Item -NewName { $_.Name -replace "_[^_]+(?=$($_.Extension)$)" } -WhatIf` –  Mar 19 '19 at 12:50
  • Fair point, @LotPings, but I wanted to keep the solution short. Assuming that the input file names contain only _one_ `.` char - as in the sample file names - allowed me to do that. Please see my update for a solution that also handles multiple `.` instances correctly while still avoiding the need to repeat the input extension. – mklement0 Mar 19 '19 at 13:25
0

Just to show another alternative,

  • the part to remove from the Name is the last element from the BaseName splitted with _
  • which is a negative index from the split [-1]
    Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{$_.BaseName.split('_\d+')[-1]}
    6
    10
    101
  • as the split removes the _ it has to be applied again to remove it.

Get-ChildItem 'C:\Path\here' -Filter *.pdf | 
   Rename-Item -NewName { $_.Name -replace '_'+$_.BaseName.split('_')[-1] } -whatif

EDIT a modified variant which splits the BaseName at the underscore
without removing the splitting character is using the -split operator and
a RegEx with a zero length lookahead

> Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{($_.BaseName -split'(?=_\d+)')[-1]}
_6
_10
_101

Get-ChildItem 'C:\Path\here' -Filter *.pdf | 
    Rename-Item -NewName { $_.Name -replace ($_.BaseName -split'(?=_)')[-1] } -whatif
  • Thanks for adding this. I had been trying to get this to work originally using `BaseName.split('_')[-1]` but wasnt having much luck. I will test this one as well – crazymatt Mar 21 '19 at 16:28
  • See the last, even simpler and more concise variant as it removes only trailing digits, so it can be run repeatedly without removing nonumeric parts. –  Mar 21 '19 at 17:03