2

I am making a program that must process about 5000 strings as quickly as possible. about 2000 of these strings must be translated via a webrequest to mymemory.translated.net. (see code below, JSON part removed since not needed here)

Try

          url = "http://api.mymemory.translated.net/get?q=" & Firstpart & "!&langpair=de|it&de=somemail@christmas.com"

          request = DirectCast(WebRequest.Create(url), HttpWebRequest)
          response = DirectCast(request.GetResponse(), HttpWebResponse)
          myreader = New StreamReader(response.GetResponseStream())

          Dim rawresp As String
          rawresp = myreader.ReadToEnd()
          Debug.WriteLine("Raw:" & rawresp)


          Catch ex As Exception
              MessageBox.Show(ex.ToString)

          End Try

the code itself is working fine, problem is it is a blocking code and needs about 1 second per string. Thats more then half an hour for all my strings. i would need to convert this code to a non blocking one and make multiple calls on the same time. Could somebody please tell me how i could do that? I was thinking of a background worker but that wouldnt speed things up.. it would just execute the code on a different thread...

thanks!

user2452250
  • 777
  • 2
  • 11
  • 26

3 Answers3

2

If you want to send 10 parallel requests, you must create 10 BackgroundWorkers. Or manually create 10 threads. Then iterate, and whenever a worker/thread is done, give it a new task.

I do not recommend firing 5000 parallel threads/workers, you must be careful: A load like that could be interpreted as spamming or an attack by the server. Don't overdo it, maybe talk to translated.net and ask them about the workload they accept. Also think about what your machine and your internet upstream can handle.

Alexander
  • 2,457
  • 1
  • 14
  • 17
  • You might also want to make some use of [WebRequest.BeginGetResponse()](http://msdn.microsoft.com/en-us/library/system.net.webrequest.begingetresponse.aspx) to make the request asynchronously. – Adrian Jun 07 '13 at 06:33
2

The problem is you aren't just being held back by the maximum number of concurrent operations. HttpWebRequests are throttled by nature (I believe the default policy allows only 2 at any given time), so you have to override that behaviour too. Please refer to the code below.

Imports System.Diagnostics
Imports System.IO
Imports System.Net
Imports System.Threading
Imports System.Threading.Tasks

Public Class Form1

  ''' <summary>
  ''' Test entry point.
  ''' </summary>
  Private Sub Form1_Load() Handles MyBase.Load
    ' Generate enough words for us to test thoroughput.
    Dim words = Enumerable.Range(1, 100) _
      .Select(Function(i) "Word" + i.ToString()) _
      .ToArray()

    ' Maximum theoretical number of concurrent requests.
    Dim maxDegreeOfParallelism = 24
    Dim sw = Stopwatch.StartNew()

    ' Capture information regarding current SynchronizationContext
    ' so that we can perform thread marshalling later on.
    Dim uiScheduler = TaskScheduler.FromCurrentSynchronizationContext()
    Dim uiFactory = New TaskFactory(uiScheduler)

    Dim transformTask = Task.Factory.StartNew(
      Sub()
        ' Apply the transformation in parallel.
        ' Parallel.ForEach implements clever load
        ' balancing, so, since each request won't
        ' be doing much CPU work, it will spawn
        ' many parallel streams - likely more than
        ' the number of CPUs available.
        Parallel.ForEach(words, New ParallelOptions With {.MaxDegreeOfParallelism = maxDegreeOfParallelism},
          Sub(word)
            ' We are running on a thread pool thread now.
            ' Be careful not to access any UI until we hit
            ' uiFactory.StartNew(...)

            ' Perform transformation.
            Dim url = "http://api.mymemory.translated.net/get?q=" & word & "!&langpair=de|it&de=somemail@christmas.com"
            Dim request = DirectCast(WebRequest.Create(url), HttpWebRequest)

            ' Note that unless you specify this explicitly,
            ' the framework will use the default and you
            ' will be limited to 2 parallel requests
            ' regardless of how many threads you spawn.
            request.ServicePoint.ConnectionLimit = maxDegreeOfParallelism

            Using response = DirectCast(request.GetResponse(), HttpWebResponse)
              Using myreader As New StreamReader(response.GetResponseStream())
                Dim rawresp = myreader.ReadToEnd()

                Debug.WriteLine("Raw:" & rawresp)

                ' Transform the raw response here.
                Dim processed = rawresp

                uiFactory.StartNew(
                  Sub()
                    ' This is running on the UI thread,
                    ' so we can access the controls,
                    ' i.e. add the processed result
                    ' to the data grid.
                    Me.Text = processed
                  End Sub, TaskCreationOptions.PreferFairness)
              End Using
            End Using
          End Sub)
      End Sub)

    transformTask.ContinueWith(
      Sub(t As Task)
        ' Always stop the stopwatch.
        sw.Stop()

        ' Again, we are back on the UI thread, so we
        ' could access UI controls if we needed to.
        If t.Status = TaskStatus.Faulted Then
          Debug.Print("The transformation errored: {0}", t.Exception)
        Else
          Debug.Print("Operation completed in {0} s.", sw.ElapsedMilliseconds / 1000)
        End If
      End Sub,
      uiScheduler)
  End Sub

End Class
Kirill Shlenskiy
  • 9,367
  • 27
  • 39
  • thanks! if i understand it properly this code will be translating 10 strings in parallel without blocking.. right? – user2452250 Jun 07 '13 at 06:40
  • No. It will be 10 strings in parallel (so it will finish much quicker), but it will still block. If you don't want to block, wrap the Parallel.ForEach call in a Task and await. – Kirill Shlenskiy Jun 07 '13 at 06:42
  • @user2452250 I've changed my example shifting the call to `Parallel.ForEach()` into the thread pool. I've also incorporated the `MaxDegreeOfParallelism` feature to make sure that we don't flood the web request queue unnecessarily. Wire up the continuation to do what you need and you're good to go. – Kirill Shlenskiy Jun 07 '13 at 07:04
  • hey its working great! takes about 2 minutes to run through all elements! can i use a number greater then 10 or is it not reccomended? and how would i implement the thread to make it non blocking? thanks for your help really appreciate it! – user2452250 Jun 07 '13 at 07:05
  • @user2452250, I've already changed my example to refrain from blocking the UI. It's still a blocking call, but it's blocking on the thread pool, so you won't really notice. As for setting `request.ServicePoint.ConnectionLimit` to be greater than 10 - sure you can do that. Chances are the web service author will hate you though :) Experiment with different numbers. Just be sure to set `MaxDegreeOfParallelism` accordingly - perhaps move that number into a class-wide constant. – Kirill Shlenskiy Jun 07 '13 at 07:09
  • nah he wont hate me.. i have to run this code only once a month! testing! will report back! thanks! – user2452250 Jun 07 '13 at 07:12
  • i keep getting "Task is not declared". what namespce am i supposed to implement? – user2452250 Jun 07 '13 at 07:20
  • @user2452250: Import `System.Threading.Tasks`. I find it bizarre though. `Parallel` and `Task` are both declared in the same namespace and in the same library (mscorlib.dll), so how come you're able to use one but not the other? – Kirill Shlenskiy Jun 07 '13 at 07:21
  • no idea.. pretty noob here.. will report back in a few minutes! – user2452250 Jun 07 '13 at 07:27
  • argh.. cant access the interface due to cross thread problems.. well i think that 2 minutes is still very acceptable! thanks for all your help! marking as accepted! – user2452250 Jun 07 '13 at 07:29
  • @user2452250, you can access the interface inside the delegate Sub passed to ContinueWith - that code runs on the UI thread. If you have another requirement, i.e. communicating progress throughout the operation - than that would add a bit of complexity to the code. – Kirill Shlenskiy Jun 07 '13 at 07:33
  • nope, no progress, all i have to do is implement this line s to the code: ` Me.DataGridView1.Rows.Add(Firstpart, jResults("responseData")("translatedText"), "click to add") Application.DoEvents()` thanks!! – user2452250 Jun 07 '13 at 07:37
  • Do you need to do that as you go, or when you're done processing? Also: avoid `Application.DoEvents` like the plague. – Kirill Shlenskiy Jun 07 '13 at 07:48
  • still trying but i am afraid i am not able to implement it as you said.. could you please help me? thanks – user2452250 Jun 07 '13 at 08:17
  • Sample code amended to include a short UI block within the parallel loop - that is where you can add rows to your GridView. You do not *need* Application.DoEvents when you're not blocking the UI because the message loop (which runs on the UI thread) will happily keep pumping. Some people use Application.DoEvents during long-running operations on the UI thread to give the user an illusion of a responsive UI. As for what problems that can create, I can't explain it better than Hans Passant: http://stackoverflow.com/questions/5181777/use-of-application-doevents Good luck. – Kirill Shlenskiy Jun 07 '13 at 08:19
1

I would create a Task for every request, so you can have a Callback for every call using ContinueWith:

  For Each InputString As String In myCollectionString


            Tasks.Task(Of String).Factory.StartNew(Function(inputString)

                    Dim request As HttpWebRequest
                    Dim myreader As StreamReader
                    Dim response As HttpWebResponse
                    Dim rawResp As String = String.Empty

                    Try

                      Dim url As String = "http://api.mymemory.translated.net/get?q=" & inputString & "!&langpair=de|it&de=somemail@christmas.com"

                      request = DirectCast(WebRequest.Create(url), HttpWebRequest)
                      response = DirectCast(request.GetResponse(), HttpWebResponse)
                      myreader = New StreamReader(response.GetResponseStream())

                      rawResp = myreader.ReadToEnd()
                      Debug.WriteLine("Raw:" & rawResp)


                    Catch ex As Exception
                      MessageBox.Show(ex.ToString)

                     End Try

                     Return rawResp

              End Function, CancellationToken.None, _ 
              Tasks.TaskCreationOptions.None).ContinueWith _
              (Sub(task As Tasks.Task(Of String))                                                                                                 
                'Dom something with result                                                                                                                          
                 Console.WriteLine(task.Result)                                                                                                                     
              End Sub)    

        Next
Carlos Landeras
  • 11,025
  • 11
  • 56
  • 82