2

I have a folder having Cobol source code, and I am trying to extract comments from the file. Below code is working for me and extracting comments into a single file.

$files = Get-Childitem -Path 'C:\Users\TextFiles' -File -Recurse -ErrorAction SilentlyContinue
$result = foreach($file in $files)
{
     '{0} | {1} ' -f $file.DirectoryName, $file.BaseName 
     $content = Get-Content $file.FullName 
 $temp = @()
     foreach ($line in $content) {
         if ($line[6] -eq '*' ) {
             $line 
}}  
}
$result  | Out-File 'C:\Users\Desktop\\Comments.txt'

But I need to extract comment for each file separately. Any help would be much appreciated.

I am new to Powershell sso I might be way off course with my logic and understanding of Powershell, please can you point me in the general direction and I'll go and do some more reading. Thanks in advance!

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • 1
    How would you like to name the output files? – Mathias R. Jessen Jul 19 '23 at 18:15
  • 2
    Welcome to StackOverflow. Please choose your tags with thinking "is this about that" (the question _is_ about PS, but not about COBOL) and take the [tour](https://stackoverflow.com/tour) to write even better questions (yours is quite good for a first one) and by that get better answers. – Simon Sobisch Jul 19 '23 at 19:11

2 Answers2

1

Here's an alternative to your own answer that should perform significantly better:

# Specify your paths here.
$inDir  = 'C:\Users\TextFiles'
$outDir = 'C:\Users\Desktop\Comments'

Get-ChildItem -Recurse -File -ErrorAction SilentlyContinue -LiteralPath $inDir |
  ForEach-Object {
    # Determine the output file name for the input file at hand.
    $outFile = Join-Path $outDir "$($_.BaseName).txt"
    # Determine the header line for the output file.
    $header = '{0} | {1} ' -f $_.DirectoryName, $_.BaseName
    # Determine the full content of the file....
    $content =
      , $header + ($_ | Get-Content -ReadCount 0).Where({ $_[6] -eq '*' })
    # ... and save it to the output file.
    Set-Content -Encoding Unicode -LiteralPath $outFile -Value $content
  }

What accounts for the performance improvements:

  • Using -ReadCount 0 with Get-Content reads the file's lines at once into an array (by default, the lines are streamed one by one and only later collected in an array, if needed).

  • The intrinsic .Where() method is a faster alternative to the Where-Object cmdlet for filtering collections.

  • The use of the unary form of , the array constructor ("comma") operator in combination with the + operator constructs a single array combining the header line as well as all matching comment lines, if any, to form the full array of lines to write to the output file.

  • Saving the output file:

    • For input that is already text, Set-Content performs better than Out-File - see this answer for details.

      • -Encoding Unicode is used to match the default encoding of Out-File in Windows Powershell. (In PowerShell (Core) 7+, all cmdlets default to BOM-less UTF-8). Adjust as needed.
    • Passing the lines to write as an array to the -Value parameter of Set-Content is much faster than providing the array via the pipeline (in which case the array would be enumerated, i.e. its elements would be sent to Set-Content one by one).

    • It is especially important to avoid calling file-writing cmdlets once per output line with -Append, as that involves opening and closing the file every time, which is much slower than providing all file content to a single invocation of such a cmdlet.

mklement0
  • 382,024
  • 64
  • 607
  • 775
0

After posting here, I tried few ways and this worked for me.

$files = Get-Childitem -Path 'Folder path for input files' -File -Recurse - 
ErrorAction SilentlyContinue
$Variable = @()
$result = foreach($file in $files)
{
    $Variable = '{0} | {1} ' -f $file.DirectoryName, $file.BaseName 
    $Variable | Out-file "$('Folder path for output 
files')\$($file.BaseName).txt" -Append
    $content = Get-Content $file.FullName 

    foreach ($line in $content) {
        if ($line[6] -eq '*' ) {
            $line | Out-file "$('Folder path for output files')\$($file.BaseName).txt" -Append
        
                          
}}  
}
$result 
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • You probably don't want `-Append` in your _first_ `Out-File` call, as that will append to potentially preexisting content from a _previous run_. While you do need `-Append` in the `foreach` loop, note that calling `Out-File` _once per output line_ is quite inefficient. I have completely rewritten my answer to show a better-performing solution, which avoids this pitfall and uses additional techniques to improve performance. – mklement0 Jul 20 '23 at 14:18
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 22 '23 at 23:03