0

In a XML file with 100 lines of code, there is one string with a specific pattern that I want to find and write into a new text file.

What the string contains is unknown and can vary, but the pattern is the same. For example:

12hi34

99ok45

Those have in common that the length is 6 and element:

0-1: integers

2-3: characters

4-5: integers

Is there a way to use Powershell and write a script that can find the string that fit the pattern and export it in a text file?

I'm new to Powershell and scripting. Tried to Google the problem and stumbled upon Select-String, but that doesn't solve my problem. Hope some of you can guide me here. Thanks.

Edit: The string is outside the root element as some "free text". It is not a traditional XML file.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Bob
  • 25
  • 5
  • 1
    If it's a valid XML file you should be able to use the XML node / element to extract whatever is inside. No need to use string or regex acrobatics. ;-) You may learn about working with XML files before you proceed with your string / pattern idea. – Olaf May 11 '19 at 11:51
  • @Olaf I didn't make it clear in the description, but the first part of the XML file contains elements. But there is also "free text" if you could say, outside the root element in the file. It is a nontraditional XML file, so I have to treat it as a text file. – Bob May 11 '19 at 12:02
  • 1
    While agreeing with @Olaf you might try `(Get-Content .\Uncommon.xml) | Select-String '(\b\d{2}[a-z]{2}\d{2}\b)' |ForEach-Object{$_.Matches.Value}|Set-Content New.txt` –  May 11 '19 at 12:15
  • @LotPings You support to grow a generation of lazy and demanding help vampires when you always deliver ready to use code right away ;-) :-D :-P I just hope Bob will use your code to try to understand what it's doing and to learn. :-) – Olaf May 11 '19 at 12:44
  • What does the xml file look like? – js2010 May 11 '19 at 15:04

2 Answers2

0

Try this...

$f = Get-Content '<xml-file>' -ReadCount 0
foreach ($l in $f) {
    if ($l -match '[0-9]{1,3}[a-zA-Z]{2,3}[0-9]{1,5}') {
        Write-Output $matches.0
    }
}

Stuffing the contents of a file into a variable. Iterating over each line of the file. Parsing out the value by pattern.

Here is a sample of the matching piece...

enter image description here

Adam
  • 3,891
  • 3
  • 19
  • 42
0

Assuming there's only one token of interest in the file, and that the letters are limited to English letters 'a' through 'z':

(Get-Content -Raw in.xml) -replace '(?s).*(\d{2}[a-z]{2}\d{2}).*', '$1' > out.txt

Note:

  • If no matching token is found, the input file's entire content is written to out.txt.

  • On Windows PowerShell > produces UTF-16LE ("Unicode") files by default (in PowerShell Core it is UTF-8 without a BOM); pipe to Set-Content out.txt -Encoding ... instead to create a file with a different encoding.

  • Get-Content -Raw reads the entire input file as a single string.

  • The -replace operator uses regular expressions (regexes) for matching - see this answer for more information.

    • Inline option (?s) at the start of regex makes . match newlines too.
    • By default, matching is case-insensitive; use -creplace for case-sensitive matching.
mklement0
  • 382,024
  • 64
  • 607
  • 775