0

I'm just getting started with multi-threading. I am running a test of my multi-thread code but I'm getting an OutOfMemory exception.

The code is converting PS to PDF using a new thread. The task takes about half a second so for this test, I'm simply sleeping the main thread for a second to make sure I don't have too many tasks running. It did more than 900 before throwing an OutOfMemory exception.

I know I need to use Thread Pool, Semaphore or Task Parallel to limit my threads, but for now I'm just doing a test of my threads.

Dim sr As New StreamReader(PSTempFolder & "PDFWrite.txt")

Do While Not sr.EndOfStream

    'get PS
    Dim FileNamePS As String = sr.ReadLine

    'get folder
    Dim CustFolder As IO.DirectoryInfo
    CustFolder = GetCustFolder(FileNamePS)

    'set PDF path and name
    FileNamePDF = CustFolder.FullName & "\Statement.pdf"

    Dim t As Thread
    Dim n As ConvertPDF = Nothing
    n = New ConvertPDF
    n.DeletePS = False
    n.PSFileName = FileNamePS
    n.PDFFileName = FileNamePDF

    t = New Thread(AddressOf n.callConvertToPDF)
    t.Start()

    'wait
    Thread.Sleep (1000)

Loop

sr.Close()

It seems it must be creating too many threads and not cleaning up the old ones. How do I clean up/dispose of the thread before creating a new one?

I suppose a second solution (in this context) would be simply using the same thread (I think I can do that), but for this question I'm more interested on disposing of the thread and releasing the memory. How do I do that?

Here is the rest of the code:

Class ConvertPDF

    Public PSFileName As String
    Public PDFFileName As String
    Public DeletePS As Boolean = False

    Delegate Function ConvertToPDFdel(ByVal svPsFileName As String, _
                     ByVal svPDFName As String, _
                     ByVal DeletePS As Boolean) As Integer

    Sub callConvertToPDF()
        Dim dlgt As New ConvertToPDFdel(AddressOf ConvertToPDF)
        Dim i As Integer = dlgt.Invoke(PSFileName, PDFFileName, DeletePS)
    End Sub

End Class

Public Function ConvertToPDF(ByVal svPsFileName As String, _
                             ByVal svPDFName As String, _
                             ByVal DeletePS As Boolean) As Integer

    'check for file
    If Not IO.File.Exists(svPsFileName) Then
        Throw New ApplicationException(svPsFileName & " cannot be found")
    End If

    'delete old file
    If IO.File.Exists(svPDFName) Then IO.File.Delete(svPDFName)

    'convert
    Dim myProcInfo As New ProcessStartInfo
    myProcInfo.FileName = DanBSolutionsLocation & "Misc\GhostScript\GSWIN32C.EXE"
    myProcInfo.Arguments = "-sDEVICE=pdfwrite -q -dSAFER -dNOPAUSE -sOUTPUTFILE=""" & svPDFName & """ -dBATCH """ & svPsFileName & """"
    'Debug.Print(myProcInfo.Arguments)

    'do the conversion
    Dim myProc As Process = Process.Start(myProcInfo)

    'wait for finish (no more than 20 seconds)
    myProc.WaitForExit(20000)

    myProcInfo = Nothing
    myProc.Dispose()

    'delete PS
    If DeletePS Then
        If IO.File.Exists(svPDFName) Then IO.File.Delete(svPsFileName)
    End If

End Function

EDIT: I did some more testing between GroverBoy's code and mine and the results are inconclusive. Sometimes one is better sometimes the other. Maybe the two really are the same and the problem is elsewhere.

The new thread starts a new process that takes 0.55 seconds to complete. If the main thread waits 1 second each iteration, that should mean that we'll never have more than one thread or one open file at a time. Why isn't this true?

What actually happens will vary and I'm not sure why. I'm testing with a loop of 100 and a 1 second wait on the main thread. I usually watch the Performance tab of the Task Manager. Sometimes I run the code and number of threads will fluctuate between 2-6 extra and the Commit Charge will fluctuate between 1044M to 1150M. This is what I want.

Other times I run the same code (100 iterations) and the number of threads keeps rising to more than 63 extra. And the Commit Charge keeps rising from 1044M to more than 1272M.

What can I do to ensure that the program will clean up the threads consistently?

D_Bester
  • 5,723
  • 5
  • 35
  • 77
  • 1
    if `callConvertToPDF` runs to completion, the new thread would exit and be cleaned up. Do you release the streams inside `callConvertToPDF`? I guess the PS or PDF files are kept open after the conversion is done. – kennyzx Oct 28 '14 at 04:29
  • @kennyzx Thanks for your comment that was helpful. I posted the rest of the code in case you're curious, but GroverBoy's answer made the difference. Thanks. – D_Bester Oct 29 '14 at 01:57
  • 1
    If I understand this code correctly you are reading in excess of 900 file names and you start both a new thread and a new process for each file. Is that correct? – Enigmativity Oct 29 '14 at 02:21
  • @Enigmativity That is correct. I did that to speed up my code. Using a new thread to start each process is ten times faster. – D_Bester Oct 29 '14 at 03:05
  • @D_Bester - You do know that starting up a new thread consumes in excess of 1MB per thread? And each process is going to be at least that much again? Getting 900 files open will consume over 1.8GB depending on how quickly the processing takes place. No wonder you're running out of memory. – Enigmativity Oct 29 '14 at 03:24
  • @Enigmativity When I saw your comment I realized my code was redundant. By simply using `Process.Start` then not waiting for exit, I got the same speed advantage as a new thread (a new process has it's own thread anyway). But this misses the point of this question which how to manage threads. – D_Bester Oct 29 '14 at 03:25
  • @Enigmativity I did some more testing and the results are inconclusive. Sometimes one is better sometimes the other. Maybe the two really are the same and the problem is elsewhere. – D_Bester Oct 30 '14 at 03:58
  • @Enigmativity The new thread starts a new process that takes 0.55 seconds to complete. If the main thread waits 1 second each iteration, that should mean that we'll never have more than one thread or one open file at a time. Why isn't this true? See my edit. – D_Bester Oct 30 '14 at 03:59

3 Answers3

1

Another answer is to use Thread.Join without using GC.Collect. This keeps the main thread waiting until the new thread finished.

t.Start(Params)

Params = Nothing

t.Join()

Using this method the threads and Commit Charge rose a bit and then stayed steady. They did not keep accumulating.

D_Bester
  • 5,723
  • 5
  • 35
  • 77
  • 1
    This looks like an approach that only uses two threads at a time: main and worker. Presumably this doesn't yield the tenfold speedup you get with unlimited threads? I'd look for something in between: a solution that throttles the thread count at some optimal number N > 1 worker thread. You could determine N through experimentation, looking at the relationship between thread+process count and memory usage. This will vary on different machines and for different bitness, so no small job. Probably someone (the TPL team?) has already documented heuristics for calculating N. – groverboy Oct 31 '14 at 02:11
  • 1
    You may find these resources helpful: [“Out Of Memory” Does Not Refer to Physical Memory](http://blogs.msdn.com/b/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx), and, for memory profiling, [Identify And Prevent Memory Leaks In Managed Code](http://msdn.microsoft.com/en-us/magazine/cc163491.aspx). – groverboy Oct 31 '14 at 02:12
0

I'm making a wild guess that your code causes OutOfMemoryException because it creates but never destroys 900 (or whatever) instances of ConvertPDF. Of course it's possible that some other of your code (not shown) causes the problem. Anyway here goes ...

Let's suppose that ConvertPDF implements IDisposable, which means that after using it you need to call ConvertPDF.Dispose or, better, use ConvertPDF in a Using clause to call Dispose automatically. Your code isn't structured to do this at an appropriate time, because it has no way of knowing when callConvertToPDF has completed executing. You can restructure so that the worker thread also does the work of initializing and disposing an instance of ConvertPDF.

The code below adds a helper class Paths, to function as a parameter for the worker thread. Warning: I don't really develop in VB.NET so this may not compile :)

Class Paths
    Public FileNamePS As String
    Public FileNamePDF As String
End Class

Sub Main()
    Using sr As New StreamReader(PSTempFolder & "PDFWrite.txt")
        Do While Not sr.EndOfStream
            Dim MyPaths As Paths = New Paths()

            'get PS
            MyPaths.FileNamePS = sr.ReadLine

            'get folder
            Dim CustFolder As IO.DirectoryInfo = GetCustFolder(MyPaths.FileNamePS)

            'set PDF path and name
            MyPaths.FileNamePDF = IO.Path.Combine(CustFolder.FullName, "Statement.pdf")

            Dim t As Thread = New Thread(AddressOf ConvertPStoPdf)

            ' start the thread, passing the parameter that ConvertPStoPdf will need
            t.Start(MyPaths)

            'wait
            Thread.Sleep (1000)
        Loop
    End Using ' automatically disposes StreamReader
End Sub

Sub ConvertPStoPdf(Data As Object)
    ' get Paths instance from weak-typed parameter
    Dim MyPaths As Paths = CType(Data, Paths)

    Using C As ConvertPDF = New ConvertPDF        
        C.DeletePS = False
        C.PSFileName = MyPaths.FileNamePS
        C.PDFFileName = MyPaths.FileNamePDF
        C.callConvertToPDF            
    End Using ' automatically disposes ConvertPDF
End Sub
groverboy
  • 1,133
  • 8
  • 20
  • @D_Bester - Glad to help. Now that you've posted the code for ConvertPDF I see it doesn't implement IDisposable, i.e. the compiler won't allow it as an argument to a `Using` statement. So in what way did this answer help? – groverboy Oct 29 '14 at 10:56
  • After your comment I decided to do some more side-by-side testing. The results are inconclusive. Sometimes one is better sometimes the other. Maybe the two really are the same and the problem is elsewhere. – D_Bester Oct 30 '14 at 03:33
0

I found one answer to force reclaiming the memory is using GC.Collect. Rico's blog: When to call GC.Collect()

t.Start (Params)

Params = Nothing

Thread.Sleep (1000)

GC.Collect()
GC.WaitForPendingFinalizers()
GC.Collect()
GC.WaitForPendingFinalizers()

Code is same as what is used to release Excel from this page.

I realize rule #1 is don't use GC.Collect. So is there a better answer?

With this method the threads didn't accumulate and the Commit Charge didn't rise. I won't be getting out-of-memory exception with this. But I would be glad to hear of a better answer. I really don't want to use Thread.Sleep in production code.

Community
  • 1
  • 1
D_Bester
  • 5,723
  • 5
  • 35
  • 77