-3

I'm a beginner to C# & I'm using HttpWebRequest to obtain a web page source. Well I'm using regex to scan the source code for content inside the html element . Basically the regex looks like this in C#.

Match m = Regex.Match(result, @"^(.*?<form .*?>(.*?)</form>.*?)+$", RegexOptions.Singleline);

The only problem that i'm facing is that until this process is completed my application freezes. Will background threading help me ? if so could you please be kind to help me out with a snippet for implementing it ? .. It will be great if i can display a progress bar or something for the user.

private void button1_Click(object sender, EventArgs e) 
        {

            Thread backgroundThread = new Thread(
                new ThreadStart(() =>
                {
                    Match m = Regex.Match(result, @"^(.*?<form .*?>(.*?)</form>.*?)+$", RegexOptions.Singleline);

                    foreach (var capture in m.Groups[2].Captures)
                    {
                        forms.Add(capture.ToString());
                    }

                    MessageBox.Show("Thread completed!");

                    if (progressBar.InvokeRequired)
                        progressBar.BeginInvoke(new Action(() => progressBar1.Value = 0));
                }
            ));

            backgroundThread.Start();
        }
DriverBoy
  • 3,047
  • 3
  • 19
  • 21
  • There are a thousand tutorials and how-tos out there... would it have been to much to look at one? – jAC Jan 21 '13 at 17:13
  • Threading tutorial http://msdn.microsoft.com/en-us/library/aa645740(v=vs.71).aspx – The Muffin Man Jan 21 '13 at 17:13
  • Is the requested data gigantic? If so that could make for a slow regex. Otherwise, I suspect that it's not your regex that is slow, but that your request is simply slow to complete for other reasons. I doubt that multi-threading will be at all helpful here. – DWright Jan 21 '13 at 17:13
  • When doing work in a non-UI thread to keep it responsive it doesn't matter what that work is. The code to do that work will be the same regardless. You appear to have done no research at all into how to perform a long running task in a background thread, and you should do that before asking here on SO. We aren't here to do the whole job for you. If you have a problem with your implementation *then* it would be appropriate to post here on SO. – Servy Jan 21 '13 at 17:13
  • 2
    @DWright Looking at the regex, it appears he's trying to search for HTML content, so the string is probably a full HTML page, which could be quite big, making it slow. Obligatory reference to [parsing HTML with regex is not suggested](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Servy Jan 21 '13 at 17:15
  • 3
    You have repetition `.*?` inside a big repetition `+`, which triggers a backtracking hell. – nhahtdh Jan 21 '13 at 17:16
  • @ DWright : Yep, its the request. @ Janes About Chleih & Servy : I'm a c# beginner & I tried to implement it but i failed so that's why i posted here on SO. – DriverBoy Jan 21 '13 at 17:17
  • @Servy, agreed on all counts. However, I'm suprised the regex would be so very slow even on a large page, so I'm also wondering if the main problem isn't actually slowness on the prior request. In any case, threading won't fix. – DWright Jan 21 '13 at 17:18
  • @GeYanZmith What did you try? Post your code here? What didn't work? Did you get an exception, a compile time error, did it behave in an unexpected manor, or what? Go into detail about the specific problems, and the expected/desired behavior – Servy Jan 21 '13 at 17:18
  • GeYanZmith: So if it's the request, threading won't help. It won't make the request go any faster. (But as @Servy points out: threading could keep your UI responsive). – DWright Jan 21 '13 at 17:19
  • @DWright Actually, as long as the slow operation isn't entirely involving interaction with the UI, threading *will* be able to solve the problem no matter what it is. If it's this, or something that happens before/after it, moving that long running task to another thread will keep the UI responsive. – Servy Jan 21 '13 at 17:19
  • @DWright It doesn't need to make the request go any faster, That's not the point. Moving work to a background thread *never* makes it go any faster, in fact, it *always* makes it slower since you're adding overhead of multiple thread interaction. What it does do is keep the UI responsive while that task is happening. – Servy Jan 21 '13 at 17:20
  • But if you want your UI responsive, I'd say that thread is overkill. Just post the request asynchronously and handle with callback. – DWright Jan 21 '13 at 17:21
  • 1
    Using regex to parse HTML is usually a bad idea. Use an HTML parser instead, like HtmlAgilityPack – Jason Jan 21 '13 at 17:23
  • @Servy I just added the code i tried to work with. – DriverBoy Jan 21 '13 at 17:27
  • @GeYanZmith Okay, and what problems are you having with that code? – Servy Jan 21 '13 at 17:29
  • @Servy , It seems to work but i doubt whether its the perfect way and i cannot display any progress to the user .. – DriverBoy Jan 21 '13 at 17:31
  • @GeYanZmith You don't know how long it will take, and have no way of knowing the current percent completion, even at a guess, so you're better off using a marquee bar for this. Just show it at the start, and hide it at the end. Note that using a `BackgroundWorker` is likely to be easier to work with, as it will provide methods/events to help with interacting with the UI thread at appropriate times. – Servy Jan 21 '13 at 17:33
  • @Servy Thank you, So i'll do that. Could you please be kind to verify that my code is perfect or is their any mistakes in it when it comes to handling the background process ? – DriverBoy Jan 21 '13 at 17:39
  • @GeYanZmith It's not perfect, but it's acceptable. – Servy Jan 21 '13 at 17:43
  • @Servy can you help me to get the progressbar inside the thread because this didn't work. `progressBar1.BeginInvoke(new Action(() => progressBar1.Value = 100));` I pasted this just after the message box. have you got any idea why it doesn't work ? Thanks – DriverBoy Jan 21 '13 at 18:03
  • @GeYanZmith Well, as I said, it doesn't really make sense to use a progress bar here in the first place. In any case, what does "it doesn't work" mean. I've asked you to provide details a number of times now, you have yet to do so. – Servy Jan 21 '13 at 18:06
  • @Servy But since it takes time , i think its the best option to let the user know that something is going on. I'm trying to start the progressbar with maruque when the thread starts and when it ends i'm trying to hide or make the progressbars value 100% . I hope you understood it this time. Thank you – DriverBoy Jan 21 '13 at 18:10
  • @GeYanZmith You can't set the progress of a marquee bar; that just doesn't make sense. If you're trying to hide it, then your code is no longer showing what you're doing. Additionally, you've told me what you want to do, and I already knew that. You have yet to tell me *what is actually happening*, so I have no way of knowing what's not working. – Servy Jan 21 '13 at 18:13
  • @Servy As i mentioned, I'm a beginner so i might not have the knowledge that you have. Thanks for the replies . – DriverBoy Jan 21 '13 at 18:15

2 Answers2

0

Here is some demonstration thread code using the System.Threading.Tasks;

Task<string> GetContent(string rawContent)
{
    var task = Task<string>.Factory.StartNew(ProcessContent, rawContent);
    return task;
}

string ProcessContent(object source)
{
    var input = (string)source;
    var match = Regex.Match(input, @"^(.*?<form .*?>(.*?)</form>.*?)+$", RegexOptions.Singleline);
    return match.Success ? match.Value : string.Empty;
}

Alternative syntax:

Task<string> GetContent(string rawContent)
{
    var rc = rawContent;
    var task = Task<string>.Factory.StartNew(() => 
    {
        var match = Regex.Match(rc, @"^(.*?<form .*?>(.*?)</form>.*?)+$", RegexOptions.Singleline);
        return match.Success ? match.Value : string.Empty;
    });
    return task;
}

Usage:

var content = await GetContent(myContent); // Where myContent is your content
Dustin Kingen
  • 20,677
  • 7
  • 52
  • 92
  • 1
    You can't `Start` a task returned from `StartNew`, it's already started. – Servy Jan 21 '13 at 17:27
  • I also much prefer using a lambda with a closure to pass variables to the method the task is running. Not only can you pass more than one variable, unlike this approach, but it will be strongly typed, so there's no need to cast it. – Servy Jan 21 '13 at 17:30
  • @Servy added that for completeness. – Dustin Kingen Jan 21 '13 at 17:36
  • @Romoku Thank you for the snippet, but whats Task ? i don't understand. – DriverBoy Jan 21 '13 at 17:43
  • @GeYanZmith `Task` encapsulates threading logic to make it more productive. The abridged explanation of `StartNew` is that it will create a new thread to process your logic and dispose of it properly after the logic completes. Go read the documentation if you want to know more. The documentation includes some good examples. – Dustin Kingen Jan 21 '13 at 17:49
  • @Romoku Vs2010 gives me 2 errors. Should i declare something before ? – DriverBoy Jan 21 '13 at 17:58
  • @GeYanZmith Make sure your project is targeting .NET 4.0 or higher (Full or Client Profile) and add a reference to `System.Threading.Tasks` Other than that post your errors if that doesn't fix it. – Dustin Kingen Jan 21 '13 at 18:02
  • @Romoku I'm on dotnet 3. I think its producing errors due to that fact. Thank you. – DriverBoy Jan 21 '13 at 18:07
  • @Romoku A `Task` doesn't strictly represent a thread. In fact, it frequently doesn't. It simply represents something that will happen at an unknown point in the future, and provides a means to determine if it has finished, and to run some code when it finishes. – Servy Jan 21 '13 at 18:07
  • @Servy That's why I said abridged since a `Task` is also non-blocking unless specified. I believe Tasks default to the `ThreadPool` as well. – Dustin Kingen Jan 21 '13 at 18:13
  • @Romoku A task may not even represent something being executed in a thread pool. It's not only possible, but common, for a task to represent something that is never executed on another thread. Most tasks generated from a `TaskCompletionSource` fall into this category, as do many tasks representing the completion of IO. – Servy Jan 21 '13 at 18:19
0

Never do time consuming processes on the UI thread. You should be retrieving the web page source asynchronously as well. There are numerous ways to achieve this, but one of the easiest is probably a BackgroundWorker. It includes a convenient method for reporting progress back to the UI thread.

http://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker.aspx

If you're on .NET framework 4.5, you should consider using async and await instead.

http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx

JosephHirn
  • 720
  • 3
  • 9