1

I want to combine every other line from the input below. Here is the input.

ALPHA-FETOPROTEIN      ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3
     ###-##-####    #######,####        In lab
ALPHA-FETOPROTEIN      ROUTINE    CH 0203 234   02/03/2023@11:05 LIVER
     ###-##-####    ########,########   In lab
ANION GAP              STAT       CH 0203 124   02/03/2023@11:06 DAY
     ###-##-####    ######,##### ####   In lab
BASIC METABOLIC PANE   ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3
     ###-##-####    #######,#### ###### In lab

This is the desired output

ALPHA-FETOPROTEIN      ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3 ###-##-####    #######,####        In lab
ALPHA-FETOPROTEIN      ROUTINE    CH 0203 234   02/03/2023@11:05 LIVER ###-##-####    ########,########   In lab
ANION GAP              STAT       CH 0203 124   02/03/2023@11:06 DAY ###-##-####    ######,##### ####   In lab
BASIC METABOLIC PANE   ROUTINE    CH 0203 001   02/03/2023@10:45 LIVERF3 ###-##-####    #######,#### ###### In lab

The code that I have tried is

for($i = 0; $i -lt $splitLines.Count; $i += 2){
  $splitLines[$i,($i+1)] -join ' '
}

It came from Joining every two lines in Powershell output. But I can't seem to get it to work for me. I'm not well versed with powershell, but i'm at the mercy of what's available at work.

Edit: Here is the entire code that I am using as requested.

# SET VARIABLES
$inputfile = "C:\Users\Will\Desktop\testfile.txt"
$outputfile = "C:\Users\Will\Desktop\testfileformatted.txt"
$new_output = "C:\Users\Will\Desktop\new_formatted.txt"

# REMOVE EXTRA CHARACTERS
$remove_beginning_capture = "-------------------------------------------------------------------------------"
$remove_end_capture = "==============================================================================="
$remove_line = "------"
$remove_strings_with_spaces = "            \d"
Get-Content $inputfile | Where-Object {$_ -notmatch $remove_beginning_capture} | Where-Object {$_ -notmatch $remove_end_capture} | Where-Object {$_ -notmatch $remove_line} | Where-Object {$_ -notmatch $remove_strings_with_spaces}  | ? {$_.trim() -ne "" } | Set-Content $outputfile

# Measures line length for loop
$file_lines = gc $outputfile | Measure-Object

#Remove Whitespace
# $whitespace_removed = (Get-Content $outputfile -Raw) -replace '\s+', ' '| Set-Content -Path C:\Users\Will\Desktop\new_formatted.csv

# Combine every other line
$lines = Get-Content $outputfile -Raw
$newcontent = $lines.Replace("`n","")
Write-Host "Content: $newcontent"
$newcontent | Set-Content $new_output

for($i = 0; $i -lt $splitLines.Count; $i += 2){
  $splitLines[$i,($i+1)] -join ' '
}
Baby Yoda
  • 73
  • 5
  • 2
    Please [edit] your question to improve your [mcve]. In particular, elaborate meaning of your statement *I can't seem to get it to work for me*. – JosefZ Feb 04 '23 at 20:28
  • The code you are showing here is working as expected. It is the code you are not showing, that is the root of the problem. ;-) – zett42 Feb 04 '23 at 21:41

2 Answers2

2

Just read two lines and then print one

$inputFilename = "c:\temp\test.txt"
$outputFilename = "c:\temp\test1.txt"

$reader = [System.IO.StreamReader]::new($inputFilename)
$writer = [System.IO.StreamWriter]::new($outputFilename)
while(($line = $reader.ReadLine()) -ne $null)
{
   $secondLine = ""
   if(!$reader.EndOfStream){ $secondLine = $reader.ReadLine() }

   $writer.WriteLine($line + $secondLine)
}
$reader.Close()
$writer.Flush()
$writer.Close()
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • 1
    You're doing the PowerShell community a disservice if you keep posting solutions that treat PowerShell like nothing but a dialect of C#. While such solutions _technically_ work (a testament to PowerShell's versatility), you'll miss out on what makes PowerShell PowerShell: its many high-level abstractions, which make for more concise solutions. As an aside: you don't need to `.Flush()` a writer just before calling `.Close()`. – mklement0 Feb 05 '23 at 03:08
  • 1
    @mklement0 : Which is more efficient? You again are wrong. I've seen lots of cases where flush is needed, especially when the output is small. Also PS doesn't always close the file unless you close the PS window. – jdweng Feb 05 '23 at 11:15
  • @mklement0 PowerShell supports it, it is not a dialect but proper PowerShell. PowerShell itself is built on top of the .NET CLR. It is like saying that using WinAPI in C# is wrong and should not be done. A statement that provides nothing to any answer. –  Feb 05 '23 at 14:52
  • 1
    @mklement0 : My solution is much cleaner, less math, and more efficient. To do module arithmetic is ridiculous in this case. You are simply trying to do something the wrong way so you can say it can be done in PS, when it shouldn't. PS does not have a method to read one line at a time. – jdweng Feb 05 '23 at 15:09
  • The question isn't about performance, and while you can _fall back to_ direct use of .NET APIs (again, a testament to PowerShell's versatility) _if and when PowerShell's performance falls short_ in a given use case, this fallback - typically resulting in more code, and notably requiring familiarity with a different knowledge domain - shouldn't be the _default_ approach, which it is in your answers. – mklement0 Feb 05 '23 at 17:24
  • This is leaving aside that there often PowerShell-native ways to improve performance too - in fact, if you use `Set-Content out.txt -Value $(...)` around the `switch` solution, it'll outperform your solution - try the benchmarks at https://gist.github.com/mklement0/e8cabb620342af37ae7d0faecba7d588#file-bm_75347681-ps1 – mklement0 Feb 05 '23 at 17:24
  • Finally, to address your various incorrect claims: (a) As for flushing - see https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs,ab19992c30648b1d for proof that `.Close()` _automatically_ flushes. If it didn't, it would amount to a serious design flaw. (b) `Get-Content` and `switch -File` (as shown in my answer) are two examples of PowerShell-native ways to read files line by line (though you can't directly read from _multiple_ files line by line in a loop/pipeline without help from either a steppable pipeline or .NET APIs). – mklement0 Feb 05 '23 at 17:38
  • What the "dialect" metaphor was meant to say: you're using PowerShell _as if it were C#_, which is ill-advised. – mklement0 Feb 05 '23 at 17:38
  • 1
    Code looks good to me, tho I agree with mklement0 you dont need to call `.Flush()` in this case. You could also just call `.Dispose()` with a loop `$reader, $writer | % Dispose` only time `.Flush()` was needed in my experience is when desling with MemoryStreams and in some very specific circumstances – Santiago Squarzon Feb 05 '23 at 17:43
  • @mklement0 : Where does it every say that c# is "ill-advised" in PS? – jdweng Feb 05 '23 at 18:10
  • Nowhere. I didn't say it, and neither would I ever say it. What I _did_ say (and meant to say) is: if you treat language X as if it were language Y, you'll end up writing un-idiomatic code that misses out on what makes language X worthwhile. In the case of PowerShell vs. C# / direct use of .NET APIs this means writing usually much more verbose code, while also potentially inducing newcomers to do so. This particular answer may not be the most obvious example, but it is part of a larger pattern; [this previous answer of yours](https://stackoverflow.com/a/74395522/45375) is a better example. – mklement0 Feb 05 '23 at 19:27
  • @mklement0 " Read you own words : "What the "dialect" metaphor was meant to say: you're using PowerShell as if it were C#, which is ill-advised." You can't do everything in PS format that you can in c#. Somethings in PS you must use c# format like DllImport. It is best to "Keep It Simple". If c# make its simpler than use c#. – jdweng Feb 05 '23 at 19:57
  • In addition to all of the above I just want to mention that stuff like Get-Content, Set-Content, etc. is literally pure c# under the hood and stating that some trickery with set-content and switch is still just pure c# which in itself calls loads of Winapi c++ parts. After more comments I do understand @mklement0 statements and where they are coming from though I don't agree with all of them. –  Feb 05 '23 at 21:41
  • @Max, PowerShell is built on _.NET_, not C# (that C# happens to be the language in which PowerShell itself is implemented is incidental). PowerShell provides its own language and commands (cmdlets), which operate on a higher level of abstraction than C# and .NET APIs; not using them means missing out on what the language has to offer. Direct use of .NET APIs is needed in two cases: (a) if no native PowerShell cmdlet is available for a given task or (b) to improve performance _if actually needed_. Direct use of .NET APIs always requires _additional knowledge_ and is usually more verbose. – mklement0 Feb 05 '23 at 22:18
  • @Max, as previously noted, this particular answer isn't the best example of "C# in PowerShell clothing", but it is a part of a larger pattern; to add another more obvious example to the previously mentioned one: https://stackoverflow.com/a/75347998/45375. My answer now shows a PowerShell-idiomatic solution that is conceptually much simpler; the `-replace`-based solution even significantly outperforms this answer, but PowerShell isn't about performance, and optimization should only be attempted _if actually needed_. – mklement0 Feb 05 '23 at 22:19
1

PowerShell-idiomatic solutions:

Use Get-Content with -ReadCount 2 in order to read the lines from your file in pairs, which allows you to process each pair in a ForEach-Object call, where the constituent lines can be joined to form a single output line.

Get-Content -ReadCount 2 yourFile.txt | 
  ForEach-Object { $_[0] + ' ' +  $_[1].TrimStart() }

The above directly outputs the resulting lines (as the for command in your question does), causing them to print to the display by default.

Pipe to Set-Content to save the output to a file:

Get-Content -ReadCount 2 yourFile.txt | 
  ForEach-Object { $_[0] + ' ' +  $_[1].TrimStart() } |
  Set-Content yourOutputFile.txt

Performance notes:

  • Unfortunately (as of PowerShell 7.3.2), Get-Content is quite slow by default - see GitHub issue #7537, and the performance of ForEach-Object and Where-Object could be improved too - see GitHub issue #10982.

  • At the expense of collecting all inputs and outputs in memory first, you can noticeably improve the performance with the following variation, which avoids the ForEach-Object cmdlet in favor of the intrinsic .ForEach() method, and, instead of piping to Set-Content, passes all output lines via the -Value parameter:

    Set-Content $tempOutFile -Value (
      (Get-Content -ReadCount 2 $tempInFile).ForEach({ $_[0] + ' ' + $_[1].TrimStart() })
    )
    
  • Read on for even faster alternatives, but remember that optimizations are only worth undertaking if actually needed - if the first PowerShell-idiomatic solution above is fast enough in practice, it is worth using for its conceptual elegance and concision.


An better-performing alternative is to use a switch statement with the -File parameter to process files line by line:

$i = 1
switch -File yourFile.txt {
  default {
    if ($i++ % 2) { $firstLineInPair = $_ }
    else          { $firstLineInPair + ' ' + $_.TrimStart() } 
  }
} 

Helper index variable $i and the modulo operation (%) are simply used to identify which line is the start of a (new) pair, and which one is its second half.

  • The switch statement is itself streaming, but it cannot be used as-is as pipeline input. By enclosing it in & { ... }, it can, but that forfeits some of the performance benefits, making it only marginally faster than the optimized Get-Content -ReadCount 2 solution:

    & {
      $i = 1
      switch -File yourFile.txt {
        default {
          if ($i++ % 2) { $firstLineInPair = $_ }
          else          { $firstLineInPair + ' ' + $_.TrimStart() } 
        }
      } 
    } | Set-Content yourOutputFile.txt
    
  • For the best performance when writing to a file, use Set-Content $outFile -Value $(...), albeit at the expense of collecting all output lines in memory first:

    Set-Content yourOutputFile.txt -Value $(
      $i = 1
      switch -File yourFile.txt {
        default {
          if ($i++ % 2) { $firstLineInPair = $_ }
          else          { $firstLineInPair + ' ' + $_.TrimStart() } 
        }
      } 
    )
    

The fastest and most concise solution is to use a regex-based approach, which reads the entire file up front:

(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2'

Note:

  • The assumption is that all lines are paired, and that the last line has a trailing newline.

  • The -replace operation matches two consecutive lines, and joins them together with a space, ignoring leading spaces on the second line. For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page.

  • To save the output to a file, you can pipe directly to Set-Content:

    (Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2' |
      Set-Content yourOutputFile.txt
    
    • In this case, because the pipeline input to Set-Content is provided by an expression that doesn't involve for-every-input-line calls to script blocks ({ ... }) (as the switch solution requires), there is virtually no slowdown resulting from use of the pipeline (whose use is generally preferable for conceptual elegance and concision).

As for what you tried:

The $splitLines-based solution in your question is predicated on having assigned all lines of the input file to this self-chosen variable as an array, which your code does not do.

While you could fill variable $splitLines with an array of lines from your input file with $splitLines = Get-Content yourFile.txt, given that Get-Content reads text files line by line by default, the switch-based line-by-line solution is more efficient and streams its results (which - if saved to a file - keeps memory usage constant, which matters with large input sets (though rarely with text files)).

A performance tip when reading all lines at once into an array with Get-Content: use -ReadCount 0, which greatly speeds up the operation:
$splitLines = Get-Content -ReadCount 0 yourFile.txt

mklement0
  • 382,024
  • 64
  • 607
  • 775