4

I'm writing a program to scrape data from IMDB, using an IMDB scraping API. It works wonderfully, yet sometimes.. it just stops. No exception thrown, no error, nothing shown on intellitrace, can't find a reason why it stops. And it's intresting, because the point where it stops is totally random!

So for example.. if I start it, it scrapes data successfully 100 times, but if I restart it, it gets stuck after 50.. I truly don't have an idea why it does this.

If I pause the code if it's stuck, it doesn't write anything (like it would be running normally without any errors), (or I just don't notice it). The green marker on the left is at

IMDb imdb_movie = new IMDb(link, false);

The source code can be found here

Any ideas? Thanks in forward!

J...
  • 30,968
  • 6
  • 66
  • 143
  • When it gets stuck, *where* is it stuck? What's the last thing it tried to do? (You need to debug.) – David Schwartz Jun 24 '12 at 10:21
  • That's the point! I can't tell, because if I pause the code if it's stuck, it doesn't write anything (like it would be running normally without any errors), (or I just don't notice it, I'm quite a noob actually) the green marker on the left is at "IMDb imdb_movie = new IMDb(link, false);" line.. but gonna debug now, and wait for it to get stuck, and will share results! –  Jun 24 '12 at 10:26
  • I don't know the content of your file,but when it finishes processing the file it is bound to stop,try putting a Console.ReadKey() at the end of Main – armin Jun 24 '12 at 10:28
  • @Levela I added your extra information into the main post. It is good practice to refine your post with any additional information requested in comments as it helps keep things clear and organized. – J... Jun 24 '12 at 13:11

1 Answers1

0

This sounds like a bug in the API you are using. I would take it up with the developer or download the class file he provides so that you can debug it yourself. If you installed the DLL without source then you'll get green-arrow during pause while the IDE waits for the external code to complete but if you add the class file to your project then you can step through and see where it is getting stuck.

Also, fundamentally : Why regex is probably a bad idea here...

Community
  • 1
  • 1
J...
  • 30,968
  • 6
  • 66
  • 143
  • Thank you, doing this right now! :> Gonna check back as soon as I have some results! –  Jun 24 '12 at 10:52
  • Oh, got it! It gets stuck here: [link](http://gyazo.com/f6be5de0214d93be2c298951d26647eb) –  Jun 24 '12 at 11:02
  • Well.. actually, I'm happy that the problem is found, but I still don't know, how to fix it :< Gonna have to look for another method I assume.. well, I could write my own class with HtmlAgilityPack anyway! –  Jun 24 '12 at 11:36
  • @Levela It is getting stuck at the `match` call but you should add some debug lines or breakpoints to try to find which line it is being called from (probably some line in `parseIMDBPage`. One of those regex expressions is likely suffering from something like catastrophic backtracking : http://www.regular-expressions.info/catastrophic.html – J... Jun 24 '12 at 12:48
  • @Levela ...from the link above : *RegexBuddy is forgiving in that it detects it's going in circles, and aborts the match attempt. Other regex engines (like .NET) will keep going forever* – J... Jun 24 '12 at 12:48
  • @Levela - if you find the offending regex it would do to post another question if you can't figure out what is causing it to be a runaway. I'm not a regex wizard but there are many here who are. – J... Jun 24 '12 at 12:51