2

I want to edit plain text files (MT940 Standard).

Here is an example file with dummy data

-
:20:296535/00000010
:21:ABNADK2AXXX
:25:ABNADK2AXXX/DK88ABNA0496434500
:28C:42/00002
:60M:C230228EUR124792,65
:61:2302280228C1750,88NTRFC1165-23-00120//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK47ABNA0243508514/BIC/ABNADK2A/NAME/
LOOP BV/REMI/AV-RUN 24022023/202301918/EREF/C1165-23-00120
:61:2302280228C4695,98NTRF6381310605374038//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK14ABNA0456766324/BIC/ABNADK2A/NAME/
DEV BV/REMI/ID16145 DEB. 1657139 FACT. 202303668 20
2303685 202303689/EREF/638131060537403857-311-2
:61:2302280228C1349,25NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK46ABNA0513892443/BIC/ABNADK2A/NAME/
EXAMPLE COM/REMI/202303656/EREF/NOTPROVIDED
:61:2302280228C55845,96NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK35ABNA0442867689/BIC/ABNADK2A/NAME/
BATH COMPANY DK/REMI/INV. 202228255-8426, OUR REF 2022611
73-79/EREF/NOTPROVIDED
:61:2302280228D105000,NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK98INGB0657624985/BIC/INGBDK2A/NAME/
TEST/REMI/OVERBOEKING/EREF/NOTPROVIDED
:62F:C230228EUR83434,72
:64:C230228EUR83434,72
:86:/ACSI/ABNADK2AXXX
-
:20:STARTUMS TA FW
:25:28020050/0521322890
:28C:017/01
:60F:C230228GBP1473111,27
:61:2302280228D1919,29N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20COMPANY?21S LT
D?22TRN AZV2023022800746?23URSP.-BETR.1.900,00 GBP?24KURS 0,87716
0 EUR ZU GBP?25GEGENWERT      2.00,08 EUR?26PROVISION FIX      7
,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?28FREMDE GEB.       12,50 E
UR?2917.02 413337?3028020050?310537246190?32HOMETESTEXAMPLE?33S 
LTD?34003
:61:2302280228D16988,81N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20BODO GU COM?21NOT A TEST?22TRN
AZV2023022800749?23URSP.-BETR.16.980,48 GBP?24
KURS 0,877160 EUR ZU GBP?25GEGENWERT     19.358,48 EUR?26PROVISIO
N FIX      7,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?2830.01 INV-278
0?29*LOREM*?3028020050?310537246190?32GOLL
?33GOL COM?34003?60INFO 0800-1234
*GEB-FREI*
:61:2302280228D867,06N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20NOTACOMPANY?21LTD?2
2TRN AZV2023022800752?23URSP.-BETR.858,73 GBP?24KURS 0,877160 EUR
 ZU GBP?25GEGENWERT        978,99 EUR?26PROVISION FIX      7,50 E
UR?27SWIFT-/TELE-SPESEN 2,00 EUR?2828.01 A221322?3028020050?31053
7246190?32KOLL?33LTD?34003
:62F:C230228GBP1453336,11
-

The script should search for lines that start with :86: and have not a slash then 4 characters and another slash following.

The regex for this is: ^:86:(?!/..../)

From this matched line the script should go up and find the next line with just a "-" and mark this as the start of the section, that should be erased. And from the matched regex line it should also go further in the file, to find the next line with only a "-" and use this (including the -) als end marker for the section, that should be erased.

this algorithm should loop through the whole file.

I have this script. And it works almost perfectly. BUT, I does not use the "-" before the matched pattern. Instead it uses the pattern-line itself as start for the section, that should be erased.

Can someone tell me what the problem is?

# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\TestKopie.A01"

# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"

# Function to remove sections based on pattern and "-"
function RemoveSections($content) {
    $outputContent = @()
    $eraseMode = $false
    $previousLine = ""

    for ($i = 0; $i -lt $content.Length; $i++) {
        $line = $content[$i]

        if ($line -match "^:86:(?!/..../)") {
            $eraseMode = $true

            # Find the previous "-" line
            $previousLineIndex = $i - 1
            while ($previousLineIndex -ge 0 -and $content[$previousLineIndex] -ne "-") {
                $previousLineIndex--
            }
            if ($previousLineIndex -ge 0) {
                $outputContent += $content[$previousLineIndex]
            }
        }

        if ($eraseMode -and $line -eq "-") {
            $eraseMode = $false

            # Find the next "-" line
            $nextLineIndex = $i + 1
            while ($nextLineIndex -lt $content.Length -and $content[$nextLineIndex] -ne "-") {
                $nextLineIndex++
            }
            if ($nextLineIndex -lt $content.Length) {
                $i = $nextLineIndex + 1  # Skip the section between "-" lines, including the next "-"
                continue
            }
        }

        if (!$eraseMode) {
            $outputContent += $line
        }
    }

    return $outputContent
}

# Read the input file content
$inputContent = Get-Content $inputFilePath

# Initialize variables
$iteration = 0
$linesRemoved = 0

# Remove sections based on pattern and "-" until no more changes occur
do {
    $iteration++
    Write-Host "Iteration: $iteration"
    Write-Host "Lines removed: $linesRemoved"
    $linesRemoved = 0

    # Remove sections and count the lines removed
    $outputContent = RemoveSections $inputContent
    $linesRemoved = ($inputContent.Length - $outputContent.Length)

    # Output progress
    Write-Host "Lines removed in this iteration: $linesRemoved"
    Write-Host "----------------------------"

    # Update the input content for the next iteration
    $inputContent = $outputContent
} while ($linesRemoved -gt 0)

# Save the modified content to the output file
$outputContent | Out-File $outputFilePath -Force
Write-Host "Process complete. Modified content saved to $outputFilePath"

EDIT: Here is the working script based on the regex-pattern of @wiktor-stribiżew :-)

# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\Test.A01"

# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"

# Read the input file content
$inputContent = Get-Content $inputFilePath -Raw

# Perform the replacement
$modifiedContent = $inputContent -replace '(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'

# Save the modified content to the output file
$modifiedContent | Set-Content $outputFilePath -Force
  • 1
    So, in the end, you want to end up with something achieved with `(Get-Content $inputFilePath -Raw) -replace '(?m)^-(?:\r?\n(?!-\r?$).*)*?^:86:(?!/..../).*(?:\n(?!-\r?$).*)*' > $outputFilePath`? – Wiktor Stribiżew Jul 12 '23 at 09:46
  • I wrote it in a script and tried it. It does not exactly what I want. Maybe I did something wrong. But your Regex knowledge is amazing. Respect. I will explain it a bit more – SECØND BANANA Jul 12 '23 at 11:36
  • I now used your expression on my example data here https://regex101.com/r/ov3nhM/1 It seems to work. hmmm...maybe I did some mistake by implemanting it in a script – SECØND BANANA Jul 12 '23 at 11:43
  • Your script does not work because you are reading the file line by line. See the `-raw` argument? It makes the whole file contents be put into the variable, and then you can use the regex that matches across multiple lines. You do not need any `try` or `while` with this approach. Just use `$output = $inputStream.Read() -replace $pattern` and then `$outputStream.Write($output)`. – Wiktor Stribiżew Jul 12 '23 at 11:49
  • I updated the script in my initial post. Now it takes 15 minutes and is still running. Did I overlook something or is this taking really that long? – SECØND BANANA Jul 12 '23 at 12:13
  • If your file is huge (gigabytes), then it might take quite some time. – Wiktor Stribiżew Jul 12 '23 at 12:16
  • No, it is just 92KB. Thats what gets me wondered. In the regex101 the complete dataset just ran in 104ms. – SECØND BANANA Jul 12 '23 at 12:18
  • 1
    How interesting. What if you use `'(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'` as regex? – Wiktor Stribiżew Jul 12 '23 at 12:54
  • That worked. And it was fast :) Thank you very much Wiktor. – SECØND BANANA Jul 12 '23 at 14:41
  • Thanks for providing information about the solution, but please note the best way to present a solution is in the form of an _answer post_ rather than by editing the information into your _question_ (it is perfectly acceptable to [answer your own question](http://stackoverflow.com/help/self-answer)). – mklement0 Jul 12 '23 at 15:01

2 Answers2

1

You can use

(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*

to simply remove the whole block of text from the entire text contents once you load it into memory as a single string.

See the regex demo. Details:

  • (?sm) - regex flags that tell the regex engine to make ^ and $ match start/end of any line (m) and to make the . match newlines, too
  • ^ - matches start of a line
  • - - a - char
  • (?:(?!^-\r?$).)*? - any char, zero or more but as few as possible occurrences, that is not a single - on an entire line
  • ^:86: - start of a line and :86:
  • (?!/..../) - immediately to the right, there must be no / + four any chars + /
  • (?:(?!^-\r?$).)* - any char, zero or more but as many as possible occurrences, that is not a single - on an entire line.

In PowerShell, you can use

# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\Test.A01"

# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"

# Read the input file content
$inputContent = Get-Content $inputFilePath -Raw

# Perform the replacement
$modifiedContent = $inputContent -replace '(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'

# Save the modified content to the output file
$modifiedContent | Set-Content $outputFilePath -Force

NOTE: Since the s flag is in use, you probably want to replace /..../ with /[^\r\n]{4}/ to match any four chars that are not carriage returns nor line feed chars.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

I modified your script, that it iterates through every file in a folder, runs the replacement and then saves the file with the same name in anoter folder.

It runs forever by just 2 files with 92 KB and 23KB. Is this a problem with the -raw import function again?

# Specify the path to the input folder
$inputFolderPath = "V:\Temp\finance\input"

# Specify the path to the output folder
$outputFolderPath = "V:\Temp\finance\output"

# Get all files in the input folder
$inputFiles = Get-ChildItem -Path $inputFolderPath -File

foreach ($inputFile in $inputFiles) {
    # Construct the output file path
    $outputFilePath = Join-Path -Path $outputFolderPath -ChildPath $inputFile.Name

    # Read the input file content
    $inputContent = Get-Content -Path $inputFile.FullName -Raw

    # Perform the replacement
    $modifiedContent = $inputContent -replace '(?m)^-(?:\r?\n(?!-\r?$).*)*?^:86:(?!/..../).*(?:\n(?!-\r?$).*)*'

    # Save the modified content to the output file
    $modifiedContent | Set-Content -Path $outputFilePath -Force

    Write-Output "Modified content saved to: $outputFilePath"
}

Write-Output "Process complete."

Here are 2 sample files i used. Example1.A01

-
:20:296535/00000010
:21:ABNADK2AXXX
:25:ABNADK2AXXX/DK88ABNA0496434500
:28C:42/00002
:60M:C230228EUR124792,65
:61:2302280228C1750,88NTRFC1165-23-00120//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK47ABNA0243508514/BIC/ABNADK2A/NAME/
LOOP BV/REMI/AV-RUN 24022023/202301918/EREF/C1165-23-00120
:61:2302280228C4695,98NTRF6381310605374038//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK14ABNA0456766324/BIC/ABNADK2A/NAME/
DEV BV/REMI/ID16145 DEB. 1657139 FACT. 202303668 20
2303685 202303689/EREF/638131060537403857-311-2
:61:2302280228C1349,25NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK46ABNA0513892443/BIC/ABNADK2A/NAME/
EXAMPLE COM/REMI/202303656/EREF/NOTPROVIDED
:61:2302280228C55845,96NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK35ABNA0442867689/BIC/ABNADK2A/NAME/
BATH COMPANY DK/REMI/INV. 202228255-8426, OUR REF 2022611
73-79/EREF/NOTPROVIDED
:61:2302280228D105000,NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK98INGB0657624985/BIC/INGBDK2A/NAME/
TEST/REMI/OVERBOEKING/EREF/NOTPROVIDED
:62F:C230228EUR83434,72
:64:C230228EUR83434,72
:86:/ACSI/ABNADK2AXXX
-
:20:STARTUMS TA FW
:25:28020050/0521322890
:28C:017/01
:60F:C230228GBP1473111,27
:61:2302280228D1919,29N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20COMPANY?21S LT
D?22TRN AZV2023022800746?23URSP.-BETR.1.900,00 GBP?24KURS 0,87716
0 EUR ZU GBP?25GEGENWERT      2.00,08 EUR?26PROVISION FIX      7
,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?28FREMDE GEB.       12,50 E
UR?2917.02 413337?3028020050?310537246190?32HOMETESTEXAMPLE?33S 
LTD?34003
:61:2302280228D16988,81N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20BODO GU COM?21NOT A TEST?22TRN
AZV2023022800749?23URSP.-BETR.16.980,48 GBP?24
KURS 0,877160 EUR ZU GBP?25GEGENWERT     19.358,48 EUR?26PROVISIO
N FIX      7,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?2830.01 INV-278
0?29*LOREM*?3028020050?310537246190?32GOLL
?33GOL COM?34003?60INFO 0800-1234
*GEB-FREI*
:61:2302280228D867,06N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20NOTACOMPANY?21LTD?2
2TRN AZV2023022800752?23URSP.-BETR.858,73 GBP?24KURS 0,877160 EUR
 ZU GBP?25GEGENWERT        978,99 EUR?26PROVISION FIX      7,50 E
UR?27SWIFT-/TELE-SPESEN 2,00 EUR?2828.01 A221322?3028020050?31053
7246190?32KOLL?33LTD?34003
:62F:C230228GBP1453336,11
-

Example1.A02

:20:2303191/10060276
:25:TERPPLPP123/PL97150011711211700657640000
:28C:2382/1
:60F:C230317PLN47131,36
:62F:C230317PLN47131,36
:64:C230317PLN47131,36
:65:C230317PLN47131,36
-
:20:6576249850000001
:25:KOP123/DK98INGB0657624985EUR
:28C:78/1
:60F:D230318EUR294657,59
:62F:D230319EUR294657,59
:64:D230319EUR294657,59
:65:D230320EUR294657,59
:65:D230321EUR294657,59
:86:/NAME/ROK//BIC/KOP//SUM/0/0/0,00/0,00/
-
:20:2303201/10060276
:25:TERPPLPP123/PL97150011711211700657640000
:28C:2383/1
:60F:C230319PLN47131,36
:62F:C230319PLN47131,36
:64:C230319PLN47131,36
:65:C230319PLN47131,36
-
:20:0096803070000001
:25:KOP123/DK12INGB0009680307EUR
:28C:78/1
:60F:C230318EUR536088,75
:62F:C230319EUR536088,75
:64:C230319EUR536088,75
:65:C230320EUR536088,75
:65:C230321EUR536088,75
:86:/NAME/NO COMP//BIC/KOP//SUM/0/0/0,00/0,00
/
-
:20:2303191/10060276
:25:TERPPLPP123/PL55150011711211700657510000
:28C:2382/1
:60F:C230317PLN4202368,10
:61:2303150317DN566,25NTRFNONREF//074/23031900001
VB LOREM IPSUM
:86: VB ELECTR. 5151468 JUST A BANK TEST
Ref:2033249
:61:230317DN8709,91NTRFNONREF//074/23031900002
z/389/02/2023
:86:82105017641000002272840402 10501764 BANK
Eurotrans Sp. z o.o. z/389/02/2023 Ref:806492323
:61:230317DN25533,32NTRFNONREF//074/23031900003
fc2301166,2301169,2301170,2301
:86:72175011520000000020335181 17501152 BNPPL O./GEPP
fc2301166,2301169,2301170,2301176 Ref:806492325
:61:230317DN140,22NTRFNONREF//074/23031900004
31/12521324
:86:15175013125650000001578188 17501312 EXAMPLE COMP 31/12521324
Ref:806492326
:62M:C230317PLN4167418,40
:64:C230317PLN4167418,40
:65:C230317PLN4167418,40
-
  • This is a part of your question, not an answer. If you need an answer for this, please share the file you have trouble with. – Wiktor Stribiżew Jul 13 '23 at 09:02
  • I added two files in the comment. With this it seems to work. Whats weird again, is that it takes very long for the few lines of input. – SECØND BANANA Jul 13 '23 at 09:41
  • It looks like the problem is with the `.*` to match the rest of the lines before checking the `:86:` text, and the negative lookahead. I would probably add a negative lookahead to check for a non-qualifying paragraph (to discard matching it any further) and use an atomic group for the line matching `.*`: `(?m)^-(?!(?:\r?\n(?!-\r?$)(?>.*))*?\n:86:/..../)(?:\r?\n(?!-\r?$).*)*?^:86:.*(?:\n(?!-\r?$).*)*` – Wiktor Stribiżew Jul 13 '23 at 10:15
  • Ah perfect. I can not quite keep up why exacty that works. But it does the job. I added your hint to replace `/..../` with `/[^\r\n]{4}/` and i added another transformation to delete multiple line breaks after your repacement action. ` # Perform the replacement $modifiedContent_rep = $inputContent -replace '(?m)^-(?!(?:\r?\n(?!-\r?$)(?>.*))*?\n:86:/[^\r\n]{4}/)(?:\r?\n(?!-\r?$).*)*?^:86:.*(?:\n(?!-\r?$).*)*' # Replace multiple line breaks $modifiedContent = $modifiedContent_rep -replace "(?:\r?\n){2,}", "`n"` – SECØND BANANA Jul 13 '23 at 10:59