2

Below I put an extract from an IMDb page, I purposely kept it short. My end goal is to get the 2 links. But I can't even figure out how to get a specific div with an id. Because obviously the class below is spread out all over the page. I've Googled, looking for an example using class and id, but still can't find he solution.

p.s. The only reason I have Dumper in there is so when I run it, I can instantly see how I still haven't got it.

    my $ua = Mojo::UserAgent->new( max_redirects=>3, timeout => 30 );
    my $dom = $ua->get($newrip)->result->dom;
    
    my $module_list = $dom->find('div.article');
    print Dumper $module_list;
    exit;



                        <div class="article" id="titleDetails">
                            <span class=rightcornerlink>
                                <a href="/register/login?why=edit&ref_=tt_dt_dt" rel="login">Edit</a>
                            </span>
                            <h2>Details</h2>
                            <div class="txt-block">
                                <h4 class="inline">Official Sites:</h4>
                                <a href="/offsite/?page-action=offsite-facebook&token=BCYtlGRCvvzcjTOrSRBQqjTPuEUBGkxbnkfQjRYZi0XJxm-A-A4vf0mzJF5WqH6HYLt2TZCuVR7c%0D%0A209QQMCwUe-51EwtDDYbNczYCnFRIRzctUhoXJCF2gsQJw6m050sV9g0sTJJEfiGP37rfeIIoXMS%0D%0ACfj2qgUNCaL2YaP_FeWVGCg39Bw-3dRsP5cB1Wk9FfobPd5tG8Q4WjVbUR2pTOvE0Pkc5QUK5E7U%0D%0AX7O9awNb0Kw%0D%0A&ref_=tt_pdt_ofs_offsite_0"

rel="nofollow">Official Facebook</a>

                                <span class="ghost">|</span>

                                <a href="/offsite/?page-action=offsite-uphe&token=BCYp_aZBofVk38VkRssgAyKsF6hpxed35fbTVV8UlILwujLNjtFs2jVedrKeAqxUjICHsNP4JDb6%0D%0AbL8Kmal9v2xFtXeROzfpKNd_XK78SUE6Cw7zJEoHxf9Ad6Cg2bzVOYY8FK26b5cQS2-Rk3oYF3zH%0D%0AF0eXIUjmsj6NPfx_Tc5BoWZU-8nwSJptfgT4OqUPlYMhxtdzjFjaXgWHNLt0uV9sf3Zuz50I9GFo%0D%0AdEv_xXgUbfo%0D%0A&ref_=tt_pdt_ofs_offsite_1" rel="nofollow">Official site</a>
                                <span class="see-more inline"></span>
                            </div>
LuisC329
  • 131
  • 8
  • 1
    my $anchor_tags = $dom->find('div.article a'); my $anchor_hrefs = $anchor_tags->map(attr => 'href'); For id you can write - $dom->find('div#titleDetails a'); Whenever I need example for Mojo:DOM I look at - https://www.perl.com/article/143/2015/1/8/Extracting-from-HTML-with-Mojo-DOM/ – rai-gaurav Jan 14 '21 at 06:49
  • @Maverick Thank you for that link, it's the one I have been studying. What seems to be going over my head is how to combine two searches. i.e. Class and Title, and then find everything in that subsection. Mojo is definitely way more powerful than WWW::Mechanize, but it's throwing me for a loop – LuisC329 Jan 14 '21 at 16:16

0 Answers0