2

I have a Powershell script that returned an output that's close to what I want, however there are a few lines and HTML-style tags I need to remove. I already have the following code to filter out:

get-content "atxtfile.txt" | select-string -Pattern '<fields>' -Context 1

However, if I attempt to pipe that output into a second "select-string", I won't get any results back. I was looking at the REGEX examples online, but most of what I've seen involves the use of coding loops to achieve their objective. I'm more used to the Linux shell where you can pipe output into multiple greps to filter out text. Is there a way to achieve the same thing or something similar with PowerShell? Here's the file I'm working with as requested:

<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.force.com/2006/04/metadata">
<actionOverrides>
    <actionName>Accept</actionName>
    <type>Default</type>
</actionOverrides>
<actionOverrides>
    <actionName>CancelEdit</actionName>
    <type>Default</type>
</actionOverrides>
   <actionOverrides>
    <actionName>Today</actionName>
    <type>Default</type>
</actionOverrides>
<actionOverrides>
    <actionName>View</actionName>
    <type>Default</type>
</actionOverrides>
<compactLayoutAssignment>SYSTEM</compactLayoutAssignment>
<enableFeeds>false</enableFeeds>
<fields>
    <fullName>ActivityDate</fullName>
</fields>
<fields>
    <fullName>ActivityDateTime</fullName>
</fields>
<fields>
    <fullName>Guid</fullName>
</fields>
<fields>
    <fullName>Description</fullName>
</fields>
</CustomObject>

So, I only want the text between the <fullName> descriptor and I have the following so far:

get-content "txtfile.txt" | select-string -Pattern '<fields>' -Context 1

This will give me everything between the <fields> descriptor, however I essentially need the <fullName> line without the XML tags.

murkywaters
  • 95
  • 2
  • 11
  • 1
    We need to see examples of the file as it is before filtering and how you expect it to be after filtering at a minimum. Code would help too. Please read these two links then revise your question: [How to ask a good question](https://stackoverflow.com/help/how-to-ask) and [How to create a Minimum, Complete, and Verifiable Example - MCVE](https://stackoverflow.com/help/mcve) – EBGreen Mar 20 '18 at 13:55
  • 1
    `Select-String` doesn't return a string. It returns a `[Match]` object. – Maximilian Burszley Mar 20 '18 at 14:08
  • 2
    Using direct string manipulation on structured string data is pretty much the wrong answer. Just load this as XML then get the value. – EBGreen Mar 20 '18 at 14:13

3 Answers3

4

The simplest PSv3+ solution is to use PowerShell's built-in XML DOM support, which makes an XML document's nodes accessible as a hierarchy of objects with dot notation:

PS> ([xml] (Get-Content -Raw txtfile.txt)).CustomObject.fields.fullName
ActivityDate
ActivityDateTime
Guid
Description    

Note: Even though the [xml] (Get-Content -Raw ...) approach to parsing an XML document is convenient, it isn't fully robust with respect to character encoding; see this answer.

Note how even though .fields is an array - representing all child <fields> elements of top-level element <CustomObject> - .fullName was directly applied to it and returned the values of child elements <fullName> across all array elements (<field> elements) as an array.

This ability to access a property on a collection and have it implicitly applied to the collection's elements, with the results getting collected in an array, is a generic PSv3+ feature called member-access enumeration.


As an alternative, consider using the Select-Xml cmdlet (available in PSv2 too), which supports XPath queries that generally allow for more complex extraction logic (though not strictly needed here); Select-Xml is a high-level wrapper around the [xml] .NET type's .SelectNodes() method.
The following is the equivalent of the solution above:

$namespaces = @{ ns="http://soap.force.com/2006/04/metadata" }
$xpathQuery = '/ns:CustomObject/ns:fields/ns:fullName'
(Select-Xml -LiteralPath txtfile.txt $xpathQuery -Namespace $namespaces).Node.InnerText

Note:

Unlike with dot notation, XML namespaces must be considered when using Select-Xml.

Given that <CustomObject> and all its descendants are in namespace xmlns, identified via URI http://soap.force.com/2006/04/metadata, you must:

  • define this namespace in a hashtable you pass as the -Namespace argument
    • Caveat: Default namespace xmlns is special in that it cannot be used as the key in the hashtable; instead, choose an arbitrary key name such as ns, but be sure to use that chosen key name as the node-name prefix (see next point).
  • prefix all node names in the XPath query with the namespace name followed by :; e.g., ns:CustomObject
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

Ok. So if you have that file then:

[xml]$xml = Get-Content atextfile.txt
$xml.CustomObject.fields | select fullname
EBGreen
  • 36,735
  • 12
  • 65
  • 85
  • Hi EBGreen, so it appears I have multiple `` tags scattered about the XML file, but the ones I need are encapsulated within the `` tags. I was thinking two filters, one to grab the '' information and then another to chop out the `` and drop the XML tag. How would that change what you have so far? – murkywaters Mar 20 '18 at 14:21
  • That code snippet that I posted will only get the the value of Fullname tags that are contained in Fields tags. Did you run it? – EBGreen Mar 20 '18 at 14:23
  • Yes, but I now have more information than what I want. I take it once you call "select" or "select-string", you can do it only once? – murkywaters Mar 20 '18 at 14:25
  • What do you actually want? This provides exactly what you asked for. Why is this too much information? Did you only want one value? If so, how would uou identify that value? – EBGreen Mar 20 '18 at 14:26
  • I want the text between the fullname strings that are only between the fields strings. There are other fullnames buried in data strings but I don't want those. Your fullname only will not work as it grabs both – murkywaters Mar 20 '18 at 14:29
  • @murkywaters that's not how his snippet works at all. It will only grab `CustomObject.fields.fullname` – Maximilian Burszley Mar 20 '18 at 14:33
  • Your example that you provided literally has 4 instances where there is a fullname string that is between fields strings. Even by your description of what you want you would get 4 results. – EBGreen Mar 20 '18 at 14:34
1

mklement0 has provided the best solution to the problem. But to answer the question about filtering text twice using Select-String.

If we pipe the results of Select-String into Out-String -Stream we can pass it to Select-String again. This can all be done on one line but I used a variable to try and make it more readable.

$Match = Get-Content "atxtfile.txt" | Select-String -Pattern '<fields>' -Context 1
$Match | Out-String -Stream  | Select-String -Pattern "Guid"

If we pipe $match to Get-Member, we will find a couple of interesting properties.

$Match.Matches.Value

This will display all the instances of <fields> (the pattern match).

$Matches.Context.PostContext
$Matches.Context.PreContext

This will contain the lines before and after <fields> (the context before and after).

mklement0
  • 382,024
  • 64
  • 607
  • 775
Dave
  • 344
  • 1
  • 8
  • Thanks for the follow-up Dave. It's probably not the best way to go about it in Powershell, but it's essentially how bash scripting in Linux operates and that's what I'm familiar with the most. – murkywaters Mar 20 '18 at 19:38