3

I have the following code that works for most files. The input file (FoundLinks.csv) is a UTF-8 file with one file path per line. It is full paths of files on a particular drive that I need to process.

$inFiles = @()
$inFiles += @(Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv")

foreach ($inFile in $inFiles) {
    Write-Host("Processing: " + $inFile)
    $objFile = Get-ChildItem -LiteralPath $inFile
    New-Object PSObject -Prop @{ 
        FullName = $objFile.FullName
        ModifyTime = $objFile.LastWriteTime
    }
} 

But even though I've used -LiteralPath, it continues to not be able to process files that have a non-breaking space in the file name.

Processing: q:\Executive\CLC\Budget\Co  2018 Budget - TO Bob (GA Prophix).xlsx
Get-ChildItem : Cannot find path 'Q:\Executive\CLC\Budget\Co  2018 Budget - TO Bob (GA Prophix).xlsx'
because it does not exist.
At ListFilesWithModifyTime.ps1:6 char:29
+     $objFile = Get-ChildItem <<<<  -LiteralPath $inFile
    + CategoryInfo          : ObjectNotFound: (Q:\Executive\CL...A Prophix).xlsx:String) [Get-ChildItem], ItemNotFound
   Exception
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand

I know my input file has the non-breaking space in the path because I'm able to open it in Notepad, copy the offending path, paste into Word, and turn on paragraph marks. It shows a normal space followed by a NBSP just before 2018.

Is PowerShell not reading in the NBSP? Am I passing it wrong to -LiteralPath? I'm at my wit's end. I saw this solution, but in that case they are supplying the path as a literal in the script, so I can't see how I could use that approach.

I've also tried: -Encoding UTF8 parameter on Get-Content, but no difference.

I'm not even sure how I can check $inFile in the code just to confirm if it still contains the NBSP.

Grateful for any help to get unstuck!

Confirmed that $inFile has NBSP

Thank you all! As per @TheMadTechnician, I have updated the code like this, and also reduced my input file to only the one file having a problem.

$inFiles = @()
$inFiles += @(Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" -Encoding UTF8)

foreach ($inFile in $inFiles) {
    Write-Host("Processing: " + $inFile)

    # list out all chars to confirm it has an NBSP
    $inFile.ToCharArray()|%{"{0} -> {1}" -f $_,[int]$_}

    $objFile = Get-ChildItem -LiteralPath $inFile
    New-Object PSObject -Prop @{ 
        FullName = $objFile.FullName
        ModifyTime = $objFile.LastWriteTime
    }
} 

And so now I can confirm that $inFile in fact still contains the NBSP just as it gets passed to Get-ChildItem. Yet Get-ChildItem says the file does not exist.

More I've tried:

  • Same if I use Get-Item instead of Get-ChildItem
  • Same if I use -Path instead of -LiteralPath
  • Windows explorer and Excel can deal with the file successfully.

I'm on a Windows 7 machine, Powershell 2.

Thanks again for all the responses!

mklement0
  • 382,024
  • 64
  • 607
  • 775
Sandra
  • 608
  • 2
  • 11
  • 23
  • Why are you using `-LiteralPath`? I don't see anything that can be construed as wildcard language in your path. – Maximilian Burszley Jun 07 '18 at 00:12
  • You can check for the NBSP, but I would suggest finding one of the names that has it and check that one specifically. Let us say that `$inFiles[4]` (the 5th file) has a NBSP. You can run this, and look for what should be the NBSP, and see if the number next to it is the same as the adjacent space: `$inFiles[4].ToCharArray()|%{"{0} -> {1}" -f $_,[int]$_}` – TheMadTechnician Jun 07 '18 at 00:59
  • 1
    @TheIncorrigible1: For both robustness and conceptual clarity it is always worth using `-LiteralPath` when you know you're dealing with literal paths. – mklement0 Jun 07 '18 at 02:05
  • @mklement0 I'm aware of that, but I think the OP had the wrong idea of its use-case. – Maximilian Burszley Jun 07 '18 at 02:06
  • 1
    @TheIncorrigible1: The question talks about the input file containing full paths and using `-LiteralPath` to access the files identified by those paths - that sounds like the right idea to me. – mklement0 Jun 07 '18 at 02:11

3 Answers3

2

It's still unclear why Sandra's code didn't work: PowerShell v2+ is capable of retrieving files with paths containing non-ASCII characters; perhaps a non-NTFS filesystem with different character encoding was involved?

However, the following workaround turned out to be effective:

$objFile = Get-ChildItem -Path ($inFile -replace ([char] 0xa0), '?')
  • The idea is to replace the non-breaking space char. (Unicode U+00A0; hex. 0xa) in the input file path with wildcard character ?, which represents any single char.

  • For Get-ChildItem to perform wildcard matching, -Path rather than -LiteralPath must be used (note that -Path is actually the default if you pass a path argument positionally, as the first argument).

  • Hypothetically, the wildcard-based paths could match multiple files; if that were the case, the individual matches would have to be examined to identify the specific match that has a non-breaking space in the position of the ?.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 2
    Thanks! I should have said before, Q: drive is a DFS path. The folder in question is on a NetApp device that presents as a Windows file server. But I'm still don't see the root cause, because the way I got the list of Excel files in the first place was using a cmd /u window and dir /b. Then processed all excel files with links to other excel files, resulting in the list I needed PowerShell to process now. So Command Prompt and Excel Interop were able to list / process those files on the same drive. Big *shrug*. I'm just so glad it's resolved and I can move forward. – Sandra Jun 07 '18 at 21:11
  • 1
    @Sandra: Thanks for letting us know; if you feel inspired to pursue this further, see if PowerShell too is able to retrieve the filenames correctly by enumeration; e.g. with `Get-ChildItem -File -Name q:\Executive\CLC\Budget > list.txt` – mklement0 Jun 07 '18 at 22:34
  • Thanks, @briantist. – mklement0 Jun 08 '18 at 02:15
1

Get-ChildItem is for listing children so you would be giving it a directory, but it seems you are giving it a file, so when it says it cannot find the path, it's because it can't find a directory with that name.

Instead, you would want to use Get-Item -LiteralPath to get each individual item (this would be the same items you would get if you ran Get-ChildItem on its parent.

I think swapping in Get-Item would make your code work as is.

After testing, I think the above is in fact false, so sorry for that, but I will leave the below in case it's helpful, even though it may not solve your immediate problem.


But let's take a look at how it can be simplified with the pipeline.

First, you're starting with an empty array, then calling a command (Get-Content) which likely already returns an array, wrapping that in an array, then concatenating it to the empty one.

You could just do:

$inFiles = Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv"

Yes, there is a chance that $inFiles will contain only a single item and not an array at all.

But the nice thing is that foreach won't mind one bit!

You can do something like this and it just works:

foreach ($string in "a literal single string") {
    Write-Host $string
}

But Get-Item (and Get-ChildItem for that matter) accept pipeline input, so they accept multiple items.

That means you could do this:

$inFiles = Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" | Get-Item

foreach ($inFile in $inFiles) {
    Write-Host("Processing: " + $inFile)
    New-Object PSObject -Prop @{ 
        FullName = $inFile.FullName
        ModifyTime = $inFile.LastWriteTime
    }
} 

But even more than that, there is a pipeline-aware cmdlet for processing items, called ForEach-Object, to which you pass a [ScriptBlock], in which $_ represents the current item, so we could do it like this:

Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" | 
    Get-Item |
    ForEach-Object -Process {
        Write-Host("Processing: " + $_)
        New-Object PSObject -Prop @{ 
            FullName = $_.FullName
            ModifyTime = $_.LastWriteTime
        }
    }

All in one pipeline!

But further, you're creating a new object with the 2 properties you want.

PowerShell has a nifty cmdlet called Select-Object which takes an input object and returns a new object containing only the properties you want; this would make for a cleaner syntax:

Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" | 
    Get-Item |
    Select-Object -Property FullName,LastWriteTime

This is the power of the the pipeline passing real objects from one command to another.

I realize this last example does not write the processing message to the screen, however you could re-add that in if you wanted:

Get-Content -Path "C:\Users\sw_admin\FoundLinks.csv" | 
    Get-Item |
    ForEach-Object -Process {
        Write-Host("Processing: " + $_)
        $_ | Select-Object -Property FullName,LastWriteTime
    }

But you might also consider that many cmdlets support verbose output and try to just add -Verbose to some of your existing cmdlets. Sadly, it won't really help in this case.

One final note, when you pass items to the filesystem cmdlets via pipeline, the parameter they bind to is in fact -LiteralPath, not -Path, so your special characters are still safe.

briantist
  • 45,546
  • 6
  • 82
  • 127
  • 1
    If they're really concerned about it being an array, they can `[string[]]$inFiles = GC 'C:\Users\sw_admin\FoundLinks.csv' -Encoding UTF8` – TheMadTechnician Jun 07 '18 at 00:57
  • @TheMadTechnician yeah that's true – briantist Jun 07 '18 at 00:58
  • 1
    @briantist, I'm SO sorry. The title of my question no longer matches my question. I started out with a pipeline, but as I typed more information into my question, stackoverflow gave better suggestions, and I continued trying different things. I "unwound" the pipeline to get more clarity on which line / which file was having issues. Thank you for your amazingly detailed information, and I'll be sure to keep it for future use. – Sandra Jun 07 '18 at 10:47
  • I have now tried | Get-Item before the for each loop, but it's the same. It still barfs on the file that has the NBSP. – Sandra Jun 07 '18 at 10:49
0

I just run into the same issue. Looks like get-childitem ak gci expects the path in unicode (UTF-16). So either convert the csv file into unicode or convert the lines that include the path as unicode within your script. Testet on PS 5.1.22621.608