1
foreach ($line in $test) {
    $line.GetType()
    $newline = $line -split ("<.*?>") -split ("{.*?}") # remove html and css tags
    $newline.GetType()
}

I came across this when trying to use the .Trim() method on $newline. It works, but the intellisense did not indicate that it would. I thought .Trim() would only work on String Objects (BaseType:System.Object), but in this instance, it seems to work on String[] Objects as well (BaseType:System.Array).

$line.GetType() returns

IsPublic IsSerial Name                                     BaseType                                                                                
-------- -------- ----                                     --------                                                                                
True     True     String                                   System.Object 

$newline.GetType() returns

IsPublic IsSerial Name                                     BaseType                                                                                
-------- -------- ----                                     --------                                                                                
True     True     String[]                                 System.Array       

                        

First off, I would like to know why my original string was converted to an array, assuming it's the return value of -split... Is it now an array of characters? I am a little confused.

Secondly, if there is a good answer, why do the string methods work on what is technically an array?

Coming from Python and C/C++, thanks.

madmonkey
  • 83
  • 8
  • You called `-split` on it? Both strings and arrays support the `-split` operation. For strings you take one string and split it into an array of strings, divided wherever the split pattern is found. For arrays of strings you get a concatenation of all the string elements split the same way. So if you have one array containing 2 elements, and you split on a letter found in both elements, you now have 4 elements. – Lasse V. Karlsen Dec 31 '21 at 21:49
  • 1
    So basically`string -split` gives array of strings. `array -split` gives another array. – Lasse V. Karlsen Dec 31 '21 at 21:53
  • @LasseV.Karlsen so the .Trim() method will work on an array of strings? Does it apply it to all elements in the array? – madmonkey Dec 31 '21 at 21:58
  • I am a bit fuzzy on whether calling `.Trim()` on an array means Powershell has defined a `Trim()` operation on arrays, or whether it lifts `Trim()` operations on elements, when called on an array, to mean another array where the operation is called on each element, but yes, it will apply to each element. It will trim each element, but it will not trim the array, meaning that if the first or last element(s) are whitespace/empty, they will still be there, they will just be empty strings. – Lasse V. Karlsen Dec 31 '21 at 22:01
  • 2
    It seems Powershell lifts operations on arrays to imply operations on each element, resulting in a new array. I just tested with things like `.ToUpper()`, `.Substring(..)`, etc. This is probably well documented somewhere. – Lasse V. Karlsen Dec 31 '21 at 22:03
  • I guess this is an extension of the thing that is documented here for `.LastName` - https://learn.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-arrays?view=powershell-7.2 - for instance an array of strings $a, evaluated as `$a.Length` produces another array with the lengths of the strings, I guess this same thing is applied to methods. I'm still learning Powershell so I can't point to better documentation, although I'm sure it exists. – Lasse V. Karlsen Dec 31 '21 at 22:06
  • Good points, @LasseV.Karlsen. As for the feature semi-officially known as _member enumeration_ (per the [blog post that introduced the feature](https://blogs.msdn.microsoft.com/powershell/2012/06/13/new-v3-language-features/) in PSv3.): it is _described_, but not _named_ in the [conceptual `about_Properties` help topic](https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_properties#properties-of-scalar-objects-and-collections). [GitHub docs issue #8437](https://github.com/MicrosoftDocs/PowerShell-Docs/issues/8437) asks for it to be given an official name. – mklement0 Jan 01 '22 at 00:57

2 Answers2

2

Santiago Squarzon's helpful answer provides an effective solution to what your code is trying to do.


To answer your questions as asked, building on Lasse V. Karlsen's helpful comments:

I would like to know why my original string was converted to an array, assuming it's the return value of -split... Is it now an array of characters?

The -split operator splits a string or array of strings into substrings by a given separator regex and returns the substrings as a string array ([string[]])

 'foo|bar' -split '\|' # -> [string[]] ('foo', 'bar')
  • With an array as input, the splitting operation is performed on each element separately, and the per-element result arrays are concatenated to form a single, flat array.

    'foo|bar', 'baz|quux' -split '\|' # -> [string[]] ('foo', 'bar', 'baz', 'quux')
    

Secondly, if there is a good answer, why do the string methods work on what is technically an array?

What you're seeing is a feature previously semi-officially known as member enumeration and soon to be officially termed member-access enumeration: The ability to access a member (a property or a method) on a collection and have it implicitly applied to each of its elements, with the results getting collected in an array (for two or more elements).

Quick example:

# .Trim() is called on *each element* of the input array.
PS> (' foo', 'bar ').Trim() | ForEach-Object { "[$_]" }
[foo]
[bar]
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

Lasse V. Karlsen already provided the key information to understand why the strings ($line) are converted to string[] after being split. What you have most likely wanted to use in this case was the -replace operator which is regex compatible.

Using below as an example:

$htmlcss = @'
table {
  font-family: arial, sans-serif;
  border-collapse: collapse;
  width: 100%;
}

td, th {
  border: 1px solid #dddddd;
  text-align: left;
  padding: 8px;
}

tr:nth-child(even) {
  background-color: #dddddd;
}

</style>
</head>
<body>
<h2>HTML Table</h2>
<table>
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
</table>
</body>
</html>
'@

Using -replace to remove the HTML and CSS tags then -split to get a string[] and lastly filter the array to skip the empty lines:

$htmlcss -replace '(?s)<.*?>|\{.*?\}' -split '\r?\n' |
Where-Object { $_ -match '\S' }

Results in:

table 
td, th 
tr:nth-child(even) 
HTML Table
    Company
    Contact
    Country
    Alfreds Futterkiste
    Maria Anders
    Germany

Note, regarding \{.*?\}, for this regex to work you must use it with a string or multi-line string. It will not work with a string array string[]. You will also need to enable the (?s) flag. Supposing you were reading this from a file you would want to use the -Raw switch on Get-Content.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37