0

How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:

Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S

So for example, if i have this file:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        xxxxx

I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.

Gerson
  • 53
  • 1
  • 5

2 Answers2

1

You could try:

  • -match to find the lines using regex-pattern
  • The .NET string-method Substring() to extract the values from the "D"-lines
  • Measure-Object -Sum to calculate the sum
  • -replace to insert the value (searches using regex-pattern).

Ex:

$text = Get-Content -Path file.txt

$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum

$text | ForEach-Object {
    if($_ -match '^S') {
        #Line starts with S -> Insert sum
        $_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
    } else {
        #Not "S"-line -> output original content
        $_
    }
} | Set-Content -Path file.txt
Frode F.
  • 52,376
  • 9
  • 98
  • 114
  • Hi Frode F, on your code, if I want to put the total on a calculated position on a field that starts with an S, say I want to put the answer on position 10 to 15 (which is the position of xxxxx in my example), whats gonna be the alternative for $_ -replace 'xxxxx', $total – Gerson Feb 21 '18 at 20:00
  • It depends. Will there always be a value in those positions or can the string end before the 10th char? What will you do if the result is longer or shorter than 6chars? – Frode F. Feb 21 '18 at 20:07
  • I will always put the sum on the position 10 until 18 characters like S00000003000000000000101.72 I will just overwrite whatever is used. so if I only have a sum of 100, then I will only overwrite the last 3 digits. If i have a sum of 4.2, then i will only overwrite the last 2 – Gerson Feb 21 '18 at 20:14
  • See updated answer. You may have to tune the numbers as I'm not sure if you want to end on 18 or 17 (position, index from 0..) – Frode F. Feb 21 '18 at 20:52
  • 1
    @FrodeF.: Good point about the culture-specific decimal mark (`.` vs. `,`) in general, but note that when using _string concatenation_, PowerShell actually uses the _invariant_ culture, where the decimal mark is always `.` E.g.: `$n = 1.2; 'val'+$n` -> `val1.2`, irrespective of the current culture. (There _are_ contexts where the current culture matters, however; e.g.: `'{0}' -f 1.2` and when outputting to the console; see [this answer](https://stackoverflow.com/a/37603732/45375) of mine.) – mklement0 Feb 21 '18 at 21:05
1

PowerShell's switch statement has powerful, but little-known features that allow you to iterate over the lines of a file (-file) and match lines by regular expressions (-regex).

Not only is switch -file convenient, it is also much faster than using cmdlets in a pipeline (see bottom section).

[double] $sum = 0

switch -regex -file file.txt {

  # Note: The string to the left of each script block below ({ ... }), 
  #       e.g., '^D', is the regex to match each line against.
  #       Inside the script blocks, $_ refers to the input line at hand.

  # Extract number, add to sum, output the line.
  '^D' { $sum += $_.Substring(2, 11); $_; continue }

  # Summary line: place sum at character position 10, with 0-padding
  # Note: `-replace ',', '.'` is only needed if your culture uses "," as the
  #       decimal mark.
  '^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
  
  # All other lines: pass them through.
  default { $_ }

}

Note:

  • continue in the script blocks short-circuits further matching for the line at hand; by contrast, if you used break, no further lines would be processed.
  • Based on a later comment, I'm assuming you want an 18-character 0-left-padded number on the S line at character position 10.

With your sample file, the above yields:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        000000000000076.85

Optional reading: Comparing the performance of switch -file ... to Get-Content ... | ForEach-Object ...

Running the following test script:

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds, 
  (Measure-Command { get-content $tmpFile | % { $_ }  }).TotalSeconds
  Remove-Item $tmpFile
}

yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):

0.0578924   # switch -file
6.0417638   # Get-Content | ForEach-Object

That is, the pipeline-based solution is about 100 (!) times slower than the switch -file solution.


Digging deeper:

Frode F. points out that Get-Content is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative:

  • Using [System.IO.File]::ReadAllLines(); however, given that it reads the entire file into memory, that is only an option with smallish files.

  • Using [System.IO.StreamReader]'s ReadLine() method in a loop.

However, use of the pipeline in itself, irrespective of the specific cmdlets used, introduces overhead. When performance matters - but only then - you should avoid it.

Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of intrinsic .ForEach() method requires PSv4+):

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
  (Measure-Command { foreach ($line in [IO.File]::ReadLines((Convert-Path $tmpFile))) { $line } }).TotalSeconds
  (Measure-Command { 
    $sr = [IO.StreamReader] (Convert-Path $tmpFile)
    while(-not $sr.EndOfStream) { $sr.ReadLine() }
    $sr.Close() 
  }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
  (Measure-Command { Get-Content $tmpFile | % { $_ }  }).TotalSeconds
  
  Remove-Item $tmpFile
}

Sample results, from fastest to slowest:

0.0124441 # switch -file
0.0365348 # [System.IO.File]::ReadLine() in foreach loop
0.0481214 # [System.IO.StreamReader] in a loop
0.1614621 # [System.IO.File]::ReadAllText() with .ForEach() method
0.2745749 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
0.5925222 # (pipeline) Get-Content with ForEach-Object

switch -file is the fastest by a factor of around 3, followed by the no-pipeline .NET solutions; using .ForEach() adds another factor of 3. Simply introducing the pipeline (ForEach-Object instead of .ForEach()) adds another factor of 2; finally, using the pipeline with Get-Content and ForEach-Object adds another factor of 2.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    +1 `Get-Content` is a known to be slow with large files. Replace it with ex `[system.io.file]::readalllines()` in the benchmark. Switch will probably be faster because of pipeline overhead, but the difference won't be that big. `Get-Content` is easy to read, explain and use so it's usually used in answers to get people started. Any script that's going to process large or many files will always need to be optimized (use .NET-classes like File or StreamReader with `while($l = $streamreader.ReadLine())` ) – Frode F. Feb 22 '18 at 05:51
  • 1
    Thanks, @FrodeF. (but where's my +1? :) Please see my update. To be clear: I picked `switch -file` in this case, because it allowed for the most elegant (in my estimation) solution to the problem, not for performance reasons. I suspected the performance benefit, but when I looked into it I found the magnitude of it baffling. – mklement0 Feb 22 '18 at 13:50