Full Solution
# Description: Adds <VATMODE>X</VATMODE> XML tags to files arriving from server, underneath each RECORD CODE line.
# Script tested and works using:
# - Powershell v5.1 on Windows 10 Pro
# - Powershell v4.0 on Windows Server 2008 R2.
# - Does NOT work on Powershell v2.0
# References
# My own question: https://stackoverflow.com/questions/45639945/powershell-advanced-insert-into-xml-files-only-if-vatmode-tag-is-missing
# https://stackoverflow.com/questions/31678072/insert-content-into-specific-place-in-text-file-in-powershell
# https://stackoverflow.com/questions/1875617/insert-content-into-text-file-in-powershell
# https://social.technet.microsoft.com/wiki/contents/articles/4310.powershell-working-with-regular-expressions-regex.aspx
# http://blog.danskingdom.com/fix-problem-where-windows-powershell-cannot-run-script-whose-path-contains-spaces/
# https://community.spiceworks.com/topic/857690-automatically-and-silently-bypass-execution-policy-for-a-powershell-script
# http://leelusoft.blogspot.com.ng/p/watch-4-folder-25.html
# References
# Assign the directory where the XML files arrive from the server
$xmlFilesLocation = "C:\XML_dumping\"
# Change directory. Without this, the script will run in the same directory that the script is located at, and that's wrong
cd $xmlFilesLocation
# Show the directory so we can easily look at what's going on. Comment this out if it becomes annoying.
Invoke-Item $xmlFilesLocation
# Regular expression to match RECORD CODE lines
$regEx = "(\W\w{6}\s\w{4}\W.+)"
# A String variable which contains the VATMODE XML tag
$vatModeExists = "<VATMODE>X</VATMODE>"
# Assign the VATMODE tag, preceding it with three tabs for proper indentation
$vatModeTag = "`t`t`t<VATMODE>X</VATMODE>"
# Get all XML file names in the directory
$files = Get-ChildItem -Path $xmlFilesLocation -Filter *.xml
# Count the number of all XML files in the directory
$numberOfFiles = (Get-ChildItem -Path $xmlFilesLocation -Filter *.xml | Measure-Object).Count
# First, loop through all files separately to check if <VATMODE>X</VATMODE> exists, and skip if true
for ($i=1; $i -le $numberOfFiles; $i++) {
# Scan the contents of each file
$content = (Get-Content $files[$i - 1] -raw)
# If <VATMODE>X</VATMODE> is detected in the file...
if ($content -match $vatModeExists) {
# ...then do not process the file (skip it)
break
}
}
# Then, loop through all files (again) separately to check if <VATMODE>X</VATMODE> is missing, and process if true
for ($j=1; $j -le $numberOfFiles; $j++) {
# Scan the contents of each file
$content = (Get-Content $files[$j - 1] -raw)
# If <VATMODE>X</VATMODE> is missing in the file...
if ($content -notmatch $vatModeExists) {
# ...then replace in $content the regular expression with $vatModeTag and insert it directly underneath RECORD CODE line
$content= [regex]::replace($content, $regEx, ('$1'+"`n"+"$vatModeTag"))
# Save the file that now has the new $vatModeTag and output it
$content | Out-File -encoding utf8 $files[$j - 1]
}
}
Problem Statement
I'm trying to achieve something similar to this, but with added complexity. These are XML files that arrive from a server every day and are dropped into a single folder for import into the accounting system. The accounting system won't import the files unless there are children <VATMODE>X</VATMODE>
under each RECORD CODE
parent. There are 2 possibilities in which those XML files arrive: one by one, or in batches. They have different names with continuously incremental numbers and varying prefixes. For example: NX1000060.xml
or NX1000061.xml
or ABN000028.xml
, and so on.
Powershell Script
# Regex to match RECORD CODE lines
$regEx = "\W\w{6}\s\w{4}\W.+"
#Regex to match exactly <VATMODE>X</VATMODE>
$vatModeExists = "\W\w{7}.\w\W{2}\w{7}."
# Assign the VATMODE tag, preceding it with three tabs for proper indentation
$vatModeTag = "`t`t`t<VATMODE>X</VATMODE>"
# Get all XML files in the directory
$files = Get-ChildItem -Path "C:\XML_dumping" -Filter *.xml
# Get the number of XML files in the directory
$numberOfFiles = (Get-ChildItem -Path "C:\XML_dumping" -Filter *.xml | Measure-Object).Count
for ($i=1; $i -lt $numberOfFiles; $i++) { # Loop through each file separately
$content = (Get-Content $files[$i - 1]) # Scan the contents of each file
if ($content -match $vatModeExists) { # If <VATMODE>X</VATMODE> is detected in the file...
break # ...then do not process the file (skip it)
}
# Get the matched RECORD CODE lines
$found = $content -match $regEx
for ($j=0; $j -lt $found.Length; $j++ ) { # Loop through each matched RECORD CODE line
echo $found[$j] $vatModeTag # Insert <VATMODE>X</VATMODE> right under RECORD CODE line
# save the files that now have VATMODE inserted into them, but how?
}
}
The above script is supposed to append VATMODE tags under each RECORD CODE line, as in the output below.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
<IMPORTMODEL>NEX</IMPORTMODEL>
<SESSION>1000060</SESSION>
<CUSTORDERS>
<RECORD CODE="NX0100096">
<VATMODE>X</VATMODE>
<INPUTDATE>19/07/2017</INPUTDATE>
<!--...and so on...-->
In Powershell ISE, the script runs fine with the echo (which is for my visual inspection), but how do I insert VATMODE and save the files that I've added the VATMODE to?
Pseudocode
- Assign regexes
- Assign VATMODE tag
- Get files list
- Get files count
- Get content of each file separately
- Check if VATMODE already exists and break
- Otherwise append VATMODE
- Save files that got the new VATMODE