2

I'm trying to crawl this page using Perl LWP:

http://livingsocial.com/cities/86/deals/138811-hour-long-photo-session-cd-and-more

I had code that used to be able to handle living social, but it seems to have stopped working. Basically the idea was to crawl the page once, get its cookie, set the cookie in the UserAgent, and crawl it twice more. By doing this, you could get through the welcome page:

$response = $browser->get($url);
$cookie_jar->extract_cookies($response);  
$browser->cookie_jar($cookie_jar);
$response = $browser->get($url);
$response = $browser->get($url);

This seems to have stopped working for normal LivingSocial pages, but still seems to work for LivinSocialEscapes. E.g.,:

http://livingsocial.com/escapes/148029-cook-islands-hotel-+-airfare

Any tips on how to get past the welcome page?

lexu
  • 8,766
  • 5
  • 45
  • 63
Vijay Boyapati
  • 7,632
  • 7
  • 31
  • 48

1 Answers1

3

It looks like this page only works with a Javascript enabled browser (which LWP::UserAgent is not) You could try WWW::Mechanize::Firefox instead:

use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
$mech->get($url);

Note that you must have Firefox and the mozrepl extension installed for this module to work.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • Could you perhaps given me a short example of how I would do this with Mechanize? Thanks – Vijay Boyapati Oct 22 '11 at 22:29
  • Hmm, so I installed Mechanize::Firefox and cpan tells me MozRepl is up to to date. But when I run the code you supplied I get: Failed to connect to , problem connecting to "localhost", port 4242: Connection refused at /usr/local/share/perl/5.10.1/MozRepl/Client.pm line 144 – Vijay Boyapati Oct 22 '11 at 23:16
  • @Vijay: see this [question](http://stackoverflow.com/questions/7417904/cant-create-an-instance-of-wwwmechanizefirefox) – Eugene Yarmash Oct 22 '11 at 23:20
  • Oh wow - it actually interacts with the Firefox browser! Hmm, I was hoping there was a simpler solution to this (although I really appreciate your replies Eugene) because the production environment that I'm running the crawler on doesn't have Firefox running on it. From what I see in the actual livingsocial page I don't see any relevant javascript code. I believe it's just setting a cookie with a dirty bit for whether you've seen the welcome page. – Vijay Boyapati Oct 22 '11 at 23:28
  • Btw Eugene, your solution works (thanks!) I just wish there was something more lightweight. It feels like I'm smashing an egg with a battle axe :) – Vijay Boyapati Oct 22 '11 at 23:43
  • @Vijay: I understand you :) But it's definitely a JS issue. Check it yourself: 1) in your browser, remove all cookies for the domain 2) disable JS 3) navigate to the page – Eugene Yarmash Oct 22 '11 at 23:58
  • You are correct, indeed Sir. So as a follow up question Eugene, could you suggest how I would get this working in a production environment? I have some servers at Rackspace. Would I just install Firefox on these servers? I see how to install and get MozRepl to work on my desktop linux environment, but I'm not sure how I do it on a remote server. – Vijay Boyapati Oct 23 '11 at 00:05