1

I use the following command in Powershell to convert files in the background but would like to log the results all in one file. Now the -RedirectStandardOutput replaces the file each run.

foreach ($l in gc ./files.txt) {Start-Process -FilePath "c:\Program Files (x86)\calibre2\ebook-convert.exe" -Argumentlist "'$l' '$l.epub'" -Wait -WindowStyle Hidden -RedirectStandardOutput log.txt}

I tried with a redirect but then the log is empty. If possible I would like to keep it a one-liner.

foreach ($l in gc ./files.txt) {Start-Process -FilePath "c:\Program Files (x86)\calibre2\ebook-convert.exe" -Argumentlist "`"$l`" `"$l.epub`"" -Wait -WindowStyle Hidden *> log.txt}
stackprotector
  • 10,498
  • 4
  • 35
  • 64
peter
  • 41,770
  • 5
  • 64
  • 108

4 Answers4

2

If sequential, synchronous execution is acceptable, you can simplify your command to use a single output redirection (the assumption is that ebook-convert.exe is a console-subsystem application, which PowerShell therefore executes synchronously (in a blocking manner).:

Get-Content ./files.txt | ForEach-Object {
  & 'c:\Program Files (x86)\calibre2\ebook-convert.exe' $_ "$_.epub" 
} *> log.txt

Placing * before > tells PowerShell to redirect all output streams, which in the case of external programs means both stdout and stderr.

If you want to control the character encoding, use Out-File - which > effectively is an alias for - with its -Encoding parameter; or, preferably, with text output - which external-program output always is in PowerShell - Set-Content. To also capture stderr output, append *>&1 to the command in the pipeline segment before the Out-File / Set-Content call.

Note that PowerShell never passes raw output from external programs through to files - they are first always decoded into .NET strings, based on the encoding stored in [Console]::OutputEncoding (the system's active legacy OEM code page by default), and then re-encoded on saving to a file, using the file-writing cmdlet's own defaults, unless overridden with -Encoding - see this answer for more information.


If you want asynchronous, parallel execution (such as via Start-Process, which is asynchronous by default), your best bet is to:

  • write to separate (temporary) files:

    • Pass a different output file to -RedirectStandardOutput / -RedirectStandardError in each invocation.

    • Note that if you want to merge stdout and stderr output and capture it in the same file, you'll have to call your .exe file via a shell (possibly another PowerShell instance) and use its redirection features; for PowerShell, it would be *>log.txt; for cmd.exe (as shown below), it would be > log.txt 2>&1

  • wait for all launched processes to finish:

    • Pass -PassThru to Start-Process and collect the process-information objects returned.

    • Then use Wait-Process to wait for all processes to terminate; use the -Timeout parameter as needed.

  • and then merge them into a single log file.

Here's an implementation:

$procsAndLogFiles = 
  Get-Content ./files.txt | ForEach-Object -Begin { $i = 0 } {
    # Create a distinct log file for each process,
    # and return its name along with a process-information object representing
    # each process as a custom object.
    $logFile = 'log{0:000}.txt' -f ++$i
    [pscustomobject] @{
      LogFile = $logFile
      Process = Start-Process -PassThru -WindowStyle Hidden `
                  -FilePath 'cmd.exe' `
                  -Argumentlist "/c `"`"c:\Program Files (x86)\calibre2\ebook-convert.exe`" `"$_`" `"$_.epub`" >`"$logFile`" 2>&1`"" 
    }
  }

# Wait for all processes to terminate.
# Add -Timeout and error handling as needed.
$procsAndLogFiles.Process | Wait-Process

# Merge all log files.
Get-Content -LiteralPath $procsAndLogFiles.LogFile > log.txt

# Clean up.
Remove-Item -LiteralPath $procsAndLogFiles.LogFile

If you want throttled parallel execution, so as to limit how many background processes can run at a time:

# Limit how many background processes may run in parallel at most.
$maxParallelProcesses = 10

# Initialize the log file.
# Use -Force to unconditionally replace an existing file.
New-Item log.txt  

# Initialize the list in which those input files whose conversion
# failed due to timing out are recorded.
$allTimedOutFiles = [System.Collections.Generic.List[string]]::new()

# Process the input files in batches of $maxParallelProcesses
Get-Content -ReadCount $maxParallelProcesses ./files.txt |
  ForEach-Object {

    $i = 0
    $launchInfos = foreach ($file in $_) {
      # Create a distinct log file for each process,
      # and return its name along with the input file name / path, and 
      # a process-information object representing each process, as a custom object.
      $logFile = 'log{0:000}.txt' -f ++$i
      [pscustomobject] @{
        InputFile = $file
        LogFile = $logFile
        Process = Start-Process -PassThru -WindowStyle Hidden `
          -FilePath 'cmd.exe' `
          -ArgumentList "/c `"`"c:\Program Files (x86)\calibre2\ebook-convert.exe`" `"$file`" `"$_.epub`" >`"$file`" 2>&1`"" 
      }
    }

    # Wait for the processes to terminate, with a timeout.
    $launchInfos.Process | Wait-Process -Timeout 30 -ErrorAction SilentlyContinue -ErrorVariable errs

    # If not all processes terminated within the timeout period,
    # forcefully terminate those that didn't.
    if ($errs) {
      $timedOut = $launchInfos | Where-Object { -not $_.Process.HasExited }
      Write-Warning "Conversion of the following input files timed out; the processes will killed:`n$($timedOut.InputFile)"
      $timedOut.Process | Stop-Process -Force
      $allTimedOutFiles.AddRange(@($timedOut.InputFile))
    }

    # Merge all temp. log files and append to the overall log file.
    $tempLogFiles = Get-Content -ErrorAction Ignore -LiteralPath ($launchInfos.LogFile | Sort-Object)
    $tempLogFiles | Get-Content >> log.txt

    # Clean up.
    $tempLogFiles | Remove-Item

  }

# * log.txt now contains all combined logs
# * $allTimedOutFiles now contains all input file names / paths 
#   whose conversion was aborted due to timing out.

Note that the above throttling technique isn't optimal, because each batch of inputs is waited for together, at which point the next batch is started. A better approach is to launch a new process as soon as one of the available parallel "slots" up, as shown in the next section; however, note that PowerShell (Core) 7+ is required.


PowerShell (Core) 7+: Efficiently throttled parallel execution, using ForEach-Object -Parallel:

PowerShell (Core) 7+ introduced thread-based parallelism to the ForEach-Object cmdlet, via the -Parallel parameter, which has built-in throttling that defaults to a maximum of 5 threads by default, but can be controlled explicitly via the -ThrottleLimit parameter.

This enables efficient throttling, as a new thread is started as soon as an available slot opens up.

The following is a self-contained example that demonstrates the technique; it works on both Windows and Unix-like platforms:

  • Inputs are 9 integers, and the conversion process is simulated simply by sleeping a random number of seconds between 1 and 9, followed by echoing the input number.

  • A timeout of 6 seconds is applied to each child process, meaning that a random number of child processes will time out and be killed.

#requires -Version 7

# Use ForEach-Object -Parallel to launch child processes in parallel,
# limiting the number of parallel threads (from which the child processes are 
# launched) via -ThrottleLimit.
# -AsJob returns a single job whose child jobs track the threads created.
$job = 
 1..9 | ForEach-Object -ThrottleLimit 3 -AsJob -Parallel {
  # Determine a temporary, thread-specific log file name.
  $logFile = 'log_{0:000}.txt' -f $_
  # Pick a radom sleep time that may or may not be smaller than the timeout period.
  $sleepTime = Get-Random -Minimum 1 -Maximum 9
  # Launch the external program asynchronously and save information about
  # the newly launched child process.
  if ($env:OS -eq 'Windows_NT') {
    $ps = Start-Process -PassThru -WindowStyle Hidden cmd.exe "/c `"timeout $sleepTime >NUL & echo $_ >$logFile 2>&1`""
  }
  else { # macOS, Linux
    $ps = Start-Process -PassThru sh "-c `"{ sleep $sleepTime; echo $_; } >$logFile 2>&1`""
  }
  # Wait for the child process to exit within a given timeout period.
  $ps | Wait-Process -Timeout 6 -ErrorAction SilentlyContinue
  # Check if a timout has occurred (implied by the process not having exited yet)
  $timedOut = -not $ps.HasExited
  if ($timedOut) {
    # Note: Only [Console]::WriteLine produces immediate output, directly to the display.
    [Console]::WriteLine("Warning: Conversion timed out for: $_")
    # Kill the timed-out process.
    $ps | Stop-Process -Force
  }
  # Construct and output a custom object that indicates the input at hand,
  # the associated log file, and whether a timeout occurred.
  [pscustomobject] @{
    InputFile = $_
    LogFile = $logFile
    TimedOut = $timedOut
  }
 }

# Wait for all child processes to exit or be killed
$processInfos = $job | Receive-Job -Wait -AutoRemoveJob

# Merge all temporary log files into an overall log file.
$tempLogFiles = Get-Item -ErrorAction Ignore -LiteralPath ($processInfos.LogFile | Sort-Object)
$tempLogFiles | Get-Content > log.txt

# Clean up the temporary log files.
$tempLogFiles | Remove-Item

# To illustrate the results, show the overall log file's content
# and which inputs caused timeouts.
[pscustomobject] @{
  CombinedLogContent = Get-Content -Raw log.txt
  InputsThatFailed = ($processInfos | Where-Object TimedOut).InputFile
} | Format-List

# Clean up the overall log file.
Remove-Item log.txt
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thanks, but when I run your script I get an error for each entry. Get-Content : Cannot find path 'log001.txt' because it does not exist. At E:\ebooks\english\_convert\mobi\generic_convert_in_place.ps1:20 char:1 + Get-Content -LiteralPath $procsAndLogFiles.LogFile > log.txt + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : ObjectNotFound: (log001.txt:String) [Get-Content], ItemNotFoundException + FullyQualifiedErrorId :PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand – peter May 14 '22 at 08:11
  • can't use asynchronous or it would have to be limited to a maximum of files processed at the same time what would be optimal but difficult to implement, that's why I use the -Wait parameter in my question – peter May 14 '22 at 08:17
  • @peter, yeah, throttling of the background processes would require substantially more work. As for the error with `Get-Content`: I got the quoting in the `cmd.exe` call wrong; please see my update, which should work now. I've also adde information about `*>` in PowerShell to capture both stdout and stderr. – mklement0 May 14 '22 at 13:19
  • Thanks, but it seems the invacation of cmd isn't necessary, see the solution i posted as answer, only problem now is that the timeout works but doesn't close everything – peter May 15 '22 at 09:55
  • Wait-Process : This command stopped operation because process "ebook-convert (3232)" is not stopped in the specified time-out. At E:\ebooks\english\_convert\mobi\generic_convert_in_place.ps1:16 char:11 + | Wait-Process -Timeout 30 + ~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : CloseError: (System.Diagnost...(ebook-convert):Process) [Wait-Process], TimeoutException + FullyQualifiedErrorId : ProcessNotTerminated,Microsoft.PowerShell.Commands.WaitProcessCommand – peter May 15 '22 at 09:55
  • Since this a long running job I still need to kill some processes manually every few hours or so – peter May 15 '22 at 10:00
  • @peter, you need `cmd` (or a nested PowerShell call) if you want to capture _both_ stdout and stderr, _combined_. If you don't care about stderr, `-RedirectStandardOutput` is enough; if you want to capture stderr _separately_, also use `-RedirectStandardError`. Re `Wait-Process`: I'll respond separately. – mklement0 May 15 '22 at 15:08
  • @peter, please see my update re throttling the number of background processes and how to handle the errors reported by `Wait-Process`. – mklement0 May 15 '22 at 17:34
  • Thanks again, I had given up allready doing it in powershell since the killing of the process often went wrong, rewrote it in Ruby and that works as intented, also with working concurrent with number of background processes. I'll try your new version tomorrow. – peter May 16 '22 at 00:16
  • @peter, I've added another solution to the answer, which uses _efficient_ throttling, based on PowerShell (Core) 7+ features. – mklement0 May 17 '22 at 02:22
1

You can use redirection and append to files if you don't use Start-Process, but a direct invocation:

foreach ($l in gc ./files.txt) {& 'C:\Program Files (x86)\calibre2\ebook-convert.exe' "$l" "$l.epub" *>> log.txt}
peter
  • 41,770
  • 5
  • 64
  • 108
stackprotector
  • 10,498
  • 4
  • 35
  • 64
  • it needs to run in the background so that it doesn't hinder, Start-Process does that with the -WindowStyle hidden – peter May 13 '22 at 13:04
  • But with a slight modification it works without taking focus, I'll edit your answer. Thanks – peter May 13 '22 at 13:14
  • But that means I'll have to search for another solution for another requirement that was also fullfilled with -StartProcess, nameliy the use of a -Timeout parameter – peter May 13 '22 at 13:16
0

For the moment I'm using an adaption on mklement0's answer. ebook-convert.exe often hangs so I need to close it down if the process takes longer than the designated time. This needs to run asynchronous because the number of files and the processor time taken (5 to 25% depending on the conversion). The timeout needs to be per file, not on the whole of the jobs.

$procsAndLogFiles = 
  Get-Content ./files.txt | ForEach-Object -Begin { $i = 0 } {
    # Create a distinct log file for each process,
    # and return its name along with a process-information object representing
    # each process as a custom object.
    $logFile = 'd:\temp\log{0:000}.txt' -f ++$i
    Write-Host "$(Get-Date) $_"
    [pscustomobject] @{
      LogFile = $logFile
      Process = Start-Process `
        -PassThru `
        -FilePath "c:\Program Files (x86)\calibre2\ebook-convert.exe" `
        -Argumentlist "`"$_`" `"$_.epub`"" `
        -WindowStyle Hidden `
        -RedirectStandardOutput $logFile `
        | Wait-Process -Timeout 30
    }
  }

# Wait for all processes to terminate.
# Add -Timeout and error handling as needed.
$procsAndLogFiles.Process

# Merge all log files.
Get-Content -LiteralPath $procsAndLogFiles.LogFile > log.txt

# Clean up.
Remove-Item -LiteralPath $procsAndLogFiles.LogFile
peter
  • 41,770
  • 5
  • 64
  • 108
  • Problem here is the timeout with the consequent killing of the process often doesn't work with ebook-convert and the number of phantom processes kept growing over time. See my other answer for a working Ruby version. – peter May 16 '22 at 00:18
0

Since the problem in my other answer was not completely solved (not killing all the processes that take longer than the timeout limit) I rewrote it in Ruby. It's not powershell but if you land on this question and also know Ruby (or not) it could help you. I believe it's the use of Threads that solves the killing issue.

require 'logger'

LOG        = Logger.new("log.txt")
PROGRAM    = 'c:\Program Files (x86)\calibre2\ebook-convert.exe'
LIST       = 'E:\ebooks\english\_convert\mobi\files.txt'
TIMEOUT    = 30
MAXTHREADS = 6

def run file, log: nil
  output = ""
  command  = %Q{"#{PROGRAM}" "#{file}" "#{file}.epub"  2>&1}
  IO.popen(command+" 2>&1") do |io|
    begin
      while (line=io.gets) do
        output += line
        log.info line.chomp if log
      end
    rescue => ex
        log.error ex.message
      system("taskkill /f /pid #{io.pid}") rescue log.error $@
    end
  end
  if File.exist? "#{file}.epub"
    puts "converted   #{file}.epub" 
    File.delete(file)
  else
    puts "error       #{file}" 
  end
  output
end

threads = []

File.readlines(LIST).each do |file|
    file.chomp! # remove line feed
  # some checks
    if !File.exist? file
        puts "not found   #{file}"
        next
    end
    if File.exist? "#{file}.epub"
        puts "skipping    #{file}"
        File.delete(file) if File.exist? file
        next
    end

    # go on with the conversion
    thread = Thread.new {run(file, log: LOG)}
    threads << thread
    next if threads.length < MAXTHREADS
    threads.each do |t|
        t.join(TIMEOUT)
        unless t.alive?
            t.kill
            threads.delete(t)
        end
    end
end
peter
  • 41,770
  • 5
  • 64
  • 108