2

I have a script that extracts metadata from each file in a directory. When the filepath is free of diacritics, the script produces a csv file that looks like this:

enter image description here

When the filepath includes a diacritic (ie. "TéstMé.txt"), the csv file has blanks in the filehash field:

enter image description here

My question is: how do I get this script to work regardless of diacritics in the filepath?

  • I have determined that the problem is not with the Get-FileHash part of the script (When I run the single line Get-FileHash "C:\Temp\New\TéstMé.txt" a hash is produced.)
  • I have also determined that replacing FileHash = Get-FileHash -Path with FileHash = Get-FileHash -LiteralPath is not a solution, as it also produces a blank.
  • I tried to change the regex in the line ($_.Trim() -match "^(?<Children>\d+)\s+(?<FullName>.*)") { in case it was blocking diacritics, but any change would bring up WARNING: parsing [unique parsing error here].
  • I also tried to change ValueFromPipeline=$True,ValueFromPipelineByPropertyName=$True from $true to $false (in case the pipeline was changing the filepath value) but that had no effect.
  • I thought maybe Robocopy (which is used in the script) was incapable of handling files with diacritics, but Robocopy C:\Temp\New C:\Temp\star moves the files fine.
  • I do have a regex for identifying illegal characters (obtained from here) but I don't know how to incorporate it into the script.
  • FYI: I cannot change the actual file names. Would love to do a find-and-replace for any letter with a diacritic, but this option isn't open to me.
Function Get-FolderItem {
    
        [cmdletbinding(DefaultParameterSetName='Filter')]
        Param (
            [parameter(Position=0,ValueFromPipeline=$True,ValueFromPipelineByPropertyName=$True)]
            [Alias('FullName')]
            [string[]]$Path = $PWD,
            [parameter(ParameterSetName='Filter')]
            [string[]]$Filter = '*.*',    
            [parameter(ParameterSetName='Exclude')]
            [string[]]$ExcludeFile,              
            [parameter()]
            [int]$MaxAge,
            [parameter()]
            [int]$MinAge
        )
        Begin {
            $params = New-Object System.Collections.Arraylist
            $params.AddRange(@("/L","/E","/NJH","/BYTES","/FP","/NC","/XJ","/R:0","/W:0","T:W"))
            If ($PSBoundParameters['MaxAge']) {
                $params.Add("/MaxAge:$MaxAge") | Out-Null
            }
            If ($PSBoundParameters['MinAge']) {
                $params.Add("/MinAge:$MinAge") | Out-Null
            }
        }
        Process {
            ForEach ($item in $Path) {
                Try {
                    $item = (Resolve-Path -LiteralPath $item -ErrorAction Stop).ProviderPath
                    If (-Not (Test-Path -LiteralPath $item -Type Container -ErrorAction Stop)) {
                        Write-Warning ("{0} is not a directory and will be skipped" -f $item)
                        Return
                    }
                    If ($PSBoundParameters['ExcludeFile']) {
                        $Script = "robocopy `"$item`" NULL $Filter $params /XF $($ExcludeFile  -join ',')"
                    } Else {
                        $Script = "robocopy `"$item`" NULL $Filter $params"
                    }
                    Write-Verbose ("Scanning {0}" -f $item)
                    Invoke-Expression $Script | ForEach {
                        Try {
                            If ($_.Trim() -match "^(?<Children>\d+)\s(?<FullName>.*)") {
                               $object = New-Object PSObject -Property @{
                                    FullName = $matches.FullName
                                    Extension = $matches.fullname -replace '.*\.(.*)','$1'
                                    FullPathLength = [int] $matches.FullName.Length
                                    FileHash = Get-FileHash -LiteralPath "\\?\$($matches.FullName)" |Select -Expand Hash
                                    Created = ([System.IO.FileInfo] $matches.FullName).creationtime
                                    LastWriteTime = ([System.IO.FileInfo] $matches.FullName).LastWriteTime
                                    
                                } 
                                $object.pstypenames.insert(0,'System.IO.RobocopyDirectoryInfo')
                                Write-Output $object
                            } Else {
                                Write-Verbose ("Not matched: {0}" -f $_)
                            }
                        } Catch {
                            Write-Warning ("{0}" -f $_.Exception.Message)
                            Return
                        }
                    }
                } Catch {
                    Write-Warning ("{0}" -f $_.Exception.Message)
                    Return
                }
            }
        }
    }
    
 Get-FolderItem "C:\Temp\New" | Export-Csv -Path C:\Temp\testesting.csv


oymonk
  • 427
  • 9
  • 27
  • 2
    You may try the parameter `-Encoding` with the value `UTF8` for the cmdlet `Export-Csv` to preserve the diacritics. – Olaf Jul 24 '21 at 01:10
  • Good suggestion. Tried it, but it didn't work (no filehash in output). But you've given me an idea - maybe there is a similar `-encoding` switch for a command earlier in the script.. – oymonk Jul 24 '21 at 01:22
  • Notice the fullpath in the results: `C:\Temp\New\T?stM?.txt`. Meaning the issue is clearly in your `regex` part. I guess that the solution is to [**Re-save your script as UTF-8 with BOM.**](https://stackoverflow.com/a/54790355/1701026) or try to set the console encoding: [`[Console]::OutputEncoding = [System.Text.Encoding]::UTF8`](https://stackoverflow.com/a/35573326/1701026) – iRon Jul 24 '21 at 06:29
  • Troubleshooting tip: remove the `Try`/`Catch` statements while debuging. I'll bed it will give you a hint where the actual issue lies. (I suspect something like a `File not found` error on the `Get-FileHash`). – iRon Jul 24 '21 at 06:43
  • Thanks iRon. I tried putting `[Console]::OutputEncoding = [System.Text.Encoding]::UTF8` in my script, then saving it as foo.psi, then making a new script with `ExecuteCommand("PowerShell.exe", "-File C:\\Temp\\foo.ps1", Environment.CurrentDirectory, DumpBytes, DumpBytes);` .But got error `missing expression.` Then I got rid of Environment.CurrentDirectory and got error `"OutputEncoding": "The handle is invalid.` The try/catch is helpful, thanks. – oymonk Jul 24 '21 at 18:42
  • 1
    When sending strings to an external program (like robocopy), the encoding setting to change is `$OutputEncoding`. You can try setting `$OutputEncoding = [Text.Encoding]::UTF8` first. – AdminOfThings Jul 28 '21 at 14:44
  • 1
    @oymonk The trouble comes from the fact that the output is not unicode, you can use instead the log and force it to be unicode. – JPBlanc Jul 30 '21 at 09:54

1 Answers1

1

Here is a solution, I output the RoboCopy output to an unicode log using /UNILOG:c:\temp\test.txt params and then use the same code

Function Get-FolderItem {
    
        [cmdletbinding(DefaultParameterSetName='Filter')]
        Param (
            [parameter(Position=0,ValueFromPipeline=$True,ValueFromPipelineByPropertyName=$True)]
            [Alias('FullName')]
            [string[]]$Path = $PWD,
            [parameter(ParameterSetName='Filter')]
            [string[]]$Filter = '*.*',    
            [parameter(ParameterSetName='Exclude')]
            [string[]]$ExcludeFile,              
            [parameter()]
            [int]$MaxAge,
            [parameter()]
            [int]$MinAge
        )
        Begin {
            $params = New-Object System.Collections.Arraylist
            $params.AddRange(@("/L","/E","/NJH","/BYTES","/FP","/NC","/XJ","/R:0","/W:0","T:W","/UNILOG:c:\temp\test.txt"))
            If ($PSBoundParameters['MaxAge']) {
                $params.Add("/MaxAge:$MaxAge") | Out-Null
            }
            If ($PSBoundParameters['MinAge']) {
                $params.Add("/MinAge:$MinAge") | Out-Null
            }
        }
        Process {
            ForEach ($item in $Path) {
                Try {
                    $item = (Resolve-Path -LiteralPath $item -ErrorAction Stop).ProviderPath
                    If (-Not (Test-Path -LiteralPath $item -Type Container -ErrorAction Stop)) {
                        Write-Warning ("{0} is not a directory and will be skipped" -f $item)
                        Return
                    }
                    If ($PSBoundParameters['ExcludeFile']) {
                        $Script = "robocopy `"$item`" NULL $Filter $params /XF $($ExcludeFile  -join ',')"
                    } Else {
                        $Script = "robocopy `"$item`" NULL $Filter $params"
                    }
                    Write-Verbose ("Scanning {0}" -f $item)
                    Invoke-Expression $Script | Out-Null
                    get-content "c:\temp\test.txt" | ForEach {
                        Try {
                            If ($_.Trim() -match "^(?<Children>\d+)\s(?<FullName>.*)") {
                               $object = New-Object PSObject -Property @{
                                    FullName = $matches.FullName
                                    Extension = $matches.fullname -replace '.*\.(.*)','$1'
                                    FullPathLength = [int] $matches.FullName.Length
                                    FileHash = Get-FileHash -LiteralPath "\\?\$($matches.FullName)" |Select -Expand Hash
                                    Created = ([System.IO.FileInfo] $matches.FullName).creationtime
                                    LastWriteTime = ([System.IO.FileInfo] $matches.FullName).LastWriteTime
                                    
                                } 
                                $object.pstypenames.insert(0,'System.IO.RobocopyDirectoryInfo')
                                Write-Output $object
                            } Else {
                                Write-Verbose ("Not matched: {0}" -f $_)
                            }
                        } Catch {
                            Write-Warning ("{0}" -f $_.Exception.Message)
                            Return
                        }
                    }
                } Catch {
                    Write-Warning ("{0}" -f $_.Exception.Message)
                    Return
                }
            }
        }
    }
    
 $a = Get-FolderItem "C:\Temp\New" | Export-Csv -Path C:\Temp\testtete.csv -Encoding Unicode
JPBlanc
  • 70,406
  • 17
  • 130
  • 175