1

I have a PowerShell 5.1 script that reads one csv file, looks up the read value in another csv file (which has some characters in it like Ω and ±) and finally writes a result to a third csv file. The lookup csv comes from China as an Excel file and I convert it to csv utf8 from Excel.

It all works fine except my regex searches while they work great at regex101.com and on the command line they don't seem to work in the Where-Object cmdlet where I need them.

So these work great. Notice Ω and ±1

PS C:\Users\grefgarg> $u = "14.3kΩ ±1% 0.1W ±100ppm/? 0603 Chip Resistor - Surface Mount RoHS" 
PS C:\Users\grefgarg> $u -match "(^14.3kΩ |  14.3KOhm )"
PS C:\Users\grefgarg> $true

But this does not. Where $b is the lookup csv and $c is the column to search say "Description"

$b= import-csv $bfile -Encoding 'utf8'
$r = "(^14.3kΩ |  14.3KOhm )"
$a= $b | where-object {($_.$($c) -match $r  )}

$a.Count is 0 If, however, I replace the Ω with \D it works again.

r$ = "(^14.3k\D |  14.3KOhm )"

I would like to use the Ω and ±1 in my regex but the \D works for now. I am asking to get a better understanding of how the pipeline, regex and encoding work.

I did try:

$PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }

I also searched for this specific issue of command line vs. Where-Object but I didn't see anything.

Thanks, Gregory

Steven
  • 6,817
  • 1
  • 14
  • 14
n0npr0phet
  • 11
  • 2
  • 2
    Are you sure the spaces in your regex string are correct? Did you try `[regex]::Escape()` on the strings? (For one thing, the dot should be `\.`) – Theo Jun 04 '21 at 18:07
  • Since the problem occurs with a _string literal_ in your _source code_, the likeliest explanation is that your _script file is misinterpreted by PowerShell_, which happens if the script is saved as UTF-8 _without a BOM_. Try saving your script as UTF-8 _with BOM_; see [this answer](https://stackoverflow.com/a/54790355/45375) for more information. – mklement0 Jun 04 '21 at 18:20
  • 1
    Try `$r = "(^14.3kΩ |^14.3kΩ |14.3KOhm )"` (_Ω U+03A9 Greek Capital Letter Omega_ or _Ω U+2126 Ohm Sign_). Note that a literal space character works the same way as `\s` character class ([whitespace](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_regular_expressions?view=powershell-5.1#whitespace)) – JosefZ Jun 04 '21 at 18:20
  • Thanks @Theo I will add \. – n0npr0phet Jun 04 '21 at 19:17
  • 1
    Saving as UTF-8 with BOM from VS Code did the trick. Thanks @mklement0 – n0npr0phet Jun 04 '21 at 19:18

0 Answers0