0

Full Solution

# Description: Adds <VATMODE>X</VATMODE> XML tags to files arriving from server, underneath each RECORD CODE line.
# Script tested and works using:
#   - Powershell v5.1 on Windows 10 Pro
#   - Powershell v4.0 on Windows Server 2008 R2.
#   - Does NOT work on Powershell v2.0

# References
# My own question: https://stackoverflow.com/questions/45639945/powershell-advanced-insert-into-xml-files-only-if-vatmode-tag-is-missing
# https://stackoverflow.com/questions/31678072/insert-content-into-specific-place-in-text-file-in-powershell
# https://stackoverflow.com/questions/1875617/insert-content-into-text-file-in-powershell
# https://social.technet.microsoft.com/wiki/contents/articles/4310.powershell-working-with-regular-expressions-regex.aspx
# http://blog.danskingdom.com/fix-problem-where-windows-powershell-cannot-run-script-whose-path-contains-spaces/
# https://community.spiceworks.com/topic/857690-automatically-and-silently-bypass-execution-policy-for-a-powershell-script
# http://leelusoft.blogspot.com.ng/p/watch-4-folder-25.html
# References

# Assign the directory where the XML files arrive from the server
$xmlFilesLocation = "C:\XML_dumping\"

# Change directory. Without this, the script will run in the same directory that the script is located at, and that's wrong
cd $xmlFilesLocation

# Show the directory so we can easily look at what's going on. Comment this out if it becomes annoying.
Invoke-Item $xmlFilesLocation

# Regular expression to match RECORD CODE lines
$regEx = "(\W\w{6}\s\w{4}\W.+)"

# A String variable which contains the VATMODE XML tag
$vatModeExists = "<VATMODE>X</VATMODE>"

# Assign the VATMODE tag, preceding it with three tabs for proper indentation
$vatModeTag = "`t`t`t<VATMODE>X</VATMODE>"

# Get all XML file names in the directory
$files = Get-ChildItem -Path $xmlFilesLocation -Filter *.xml

# Count the number of all XML files in the directory
$numberOfFiles = (Get-ChildItem -Path $xmlFilesLocation -Filter *.xml | Measure-Object).Count

# First, loop through all files separately to check if <VATMODE>X</VATMODE> exists, and skip if true
for ($i=1; $i -le $numberOfFiles; $i++) {

    # Scan the contents of each file
    $content = (Get-Content $files[$i - 1] -raw)

    # If <VATMODE>X</VATMODE> is detected in the file...
    if ($content -match $vatModeExists) {
        # ...then do not process the file (skip it)
        break
    }
}

# Then, loop through all files (again) separately to check if <VATMODE>X</VATMODE> is missing, and process if true
for ($j=1; $j -le $numberOfFiles; $j++) {

    # Scan the contents of each file
    $content = (Get-Content $files[$j - 1] -raw)

    # If <VATMODE>X</VATMODE> is missing in the file...
    if ($content -notmatch $vatModeExists) {

        # ...then replace in $content the regular expression with $vatModeTag and insert it directly underneath RECORD CODE line
        $content= [regex]::replace($content, $regEx, ('$1'+"`n"+"$vatModeTag"))

        # Save the file that now has the new $vatModeTag and output it
        $content | Out-File -encoding utf8 $files[$j - 1]
    }
}

Problem Statement

I'm trying to achieve something similar to this, but with added complexity. These are XML files that arrive from a server every day and are dropped into a single folder for import into the accounting system. The accounting system won't import the files unless there are children <VATMODE>X</VATMODE> under each RECORD CODE parent. There are 2 possibilities in which those XML files arrive: one by one, or in batches. They have different names with continuously incremental numbers and varying prefixes. For example: NX1000060.xml or NX1000061.xml or ABN000028.xml, and so on.

Powershell Script

# Regex to match RECORD CODE lines
$regEx = "\W\w{6}\s\w{4}\W.+"

#Regex to match exactly <VATMODE>X</VATMODE>
$vatModeExists = "\W\w{7}.\w\W{2}\w{7}."

# Assign the VATMODE tag, preceding it with three tabs for proper indentation
$vatModeTag = "`t`t`t<VATMODE>X</VATMODE>"

# Get all XML files in the directory
$files = Get-ChildItem -Path "C:\XML_dumping" -Filter *.xml

# Get the number of XML files in the directory
$numberOfFiles = (Get-ChildItem -Path "C:\XML_dumping" -Filter *.xml | Measure-Object).Count

for ($i=1; $i -lt $numberOfFiles; $i++) { # Loop through each file separately
    $content = (Get-Content $files[$i - 1]) # Scan the contents of each file
    if ($content -match $vatModeExists) { # If <VATMODE>X</VATMODE> is detected in the file...
        break # ...then do not process the file (skip it)
    }

    # Get the matched RECORD CODE lines
    $found = $content -match $regEx
    for ($j=0; $j -lt $found.Length; $j++ ) { # Loop through each matched RECORD CODE line
        echo  $found[$j] $vatModeTag # Insert <VATMODE>X</VATMODE> right under RECORD CODE line
        # save the files that now have VATMODE inserted into them, but how?
    }
}

The above script is supposed to append VATMODE tags under each RECORD CODE line, as in the output below.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000060</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100096">
        <VATMODE>X</VATMODE>
        <INPUTDATE>19/07/2017</INPUTDATE>
        <!--...and so on...-->

In Powershell ISE, the script runs fine with the echo (which is for my visual inspection), but how do I insert VATMODE and save the files that I've added the VATMODE to?

Pseudocode

  1. Assign regexes
  2. Assign VATMODE tag
  3. Get files list
  4. Get files count
  5. Get content of each file separately
  6. Check if VATMODE already exists and break
  7. Otherwise append VATMODE
  8. Save files that got the new VATMODE
Ugo
  • 159
  • 1
  • 12
  • So looking at what you had does not make much sense to me. 1. Get All XML files 2. Get-Contents of all those files 3. Loop Every File 4. Try and display the contents? 5. Save all the contents of all 3 files back into each file? – ArcSet Aug 11 '17 at 17:11
  • @ArcSet That's what it looks like indeed, and it seems wrong. What I need is: 1. Take the first file. 2. Read it and see if VATMODE tag exists. 3a. If it exists, break. 3b. Else, append VATMODE tag. 4. Save the file. 5. Take file n+1 and repeat steps 1 to 4 until there are no more files to be processed. – Ugo Aug 12 '17 at 08:10

2 Answers2

1

I use [regex]::replace, it works for me. The parentheses in the regex are there to retrieve the value in $1. Also I replaced in your code -lt by -le in the for loop.

# Regex to match RECORD CODE lines
$regEx = "\W\w{6}\s\w{4}\W.+"
$regExParen = "(\W\w{6}\s\w{4}\W.+)"
#Regex to match exactly <VATMODE>X</VATMODE>
$vatModeExists = "\W\w{7}.\w\W{2}\w{7}."

# Assign the VATMODE tag, preceding it with three tabs for proper indentation
$vatModeTag = "`t`t`t<VATMODE>X</VATMODE>"

# Get all XML files in the directory
$files = Get-ChildItem -Path "C:\Users\user1\Documents\XML_dumping" -Filter *.xml

# Get the number of XML files in the directory
$numberOfFiles = (Get-ChildItem -Path "C:\Users\user1\Documents\XML_dumping" -Filter *.xml | Measure-Object).Count
for ($i=1; $i -le $numberOfFiles; $i++) { # Loop through each file separately
    $content = (Get-Content $files[$i - 1] -raw) # Scan the contents of each file

    if ($content -match $vatModeExists) { 
    # If <VATMODE>X</VATMODE> is detected in the file...
    echo "<VATMODE>X</VATMODE>"
    break # ...then do not process the file (skip it)

}
    # replaces in $content the reg. expression with VATNUMBER
    $content= [regex]::replace($content, $regExParen, ('$1'+"`r`n"+"VATNUMBER"+"`r`n")) 
    # Insert <VATMODE>X</VATMODE> right under RECORD CODE line
    echo $content
        # save the files that now have VATMODE inserted into them, but how?
    $content | Out-File -encoding utf8 $files[$i - 1]
}
  • I do not want to be too hasty with my conclusion, but I've tested your script rigorously and it works. I have a few more tests to do (I'm testing as we speak), but it's working perfectly for now. I've replaced "VATNUMBER" with `$vatModeTag` and removed `+"\`r\`n"` because it adds an extra line that isn't needed. These are only cosmetic changes though, and the logic works flawlessly. I will report my findings, accept your answer and post the entire script soon. Thank you so very much! :) – Ugo Aug 13 '17 at 14:17
  • As promised, I used your solution and tweaked it a bit and posted the full script at the top of the post. Once again, thank you so much! :) – Ugo Aug 24 '17 at 13:46
0

So I tried to come up with something. But is a little diffrent then what you asked for.

# Assign the path where the XML files are getting dumped as they arrive from the server
$fileName = "*.xml"

# Assign the regular expression patterns
$regEx = "\W\w{6}\s\w{4}\W.+"
$vatModeExists = "\W\w{7}.\w\W{2}\w{7}."

# Assign the VATMODE tag, preceding it with a line break and three tabs for proper indentation
$vatModeTag = "`n`t`t`t<VATMODE>X</VATMODE>"

$Output
foreach($file in $fileName){
    if ((Get-Content $file) -notmatch $vatModeExists){
        if($file -match $regex) { # if RECORD CODE line is found
            $file += $vatModeTag # append VATMODE after each RECORD CODE line
        }
        $output += $file
    }
}
Set-Content -path SomeFile.xml -Value $output
ArcSet
  • 6,518
  • 1
  • 20
  • 34
  • I'm going to try your proposed solution and report back my findings. Thank you. – Ugo Aug 12 '17 at 08:12
  • I've re-written the code from the ground up, but I'm unable to save the modified files – Ugo Aug 12 '17 at 21:39