1

I'm trying to use Scrapy to get the URLs of images on a page with ID HERO_PHOTO. The target element has the following HTML code

<img alt="Photo of Gray Line" style="position: relative; left: -50px; top: 0px;" id="HERO_PHOTO" class="flexibleImage" src="https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg" width="352" height="260">

Within Chrome browser, running

$('#HERO_PHOTO').attr('src')

grabs the URL correctly

"https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg"

Problem: However using the following CSS selector in Scrapy,

response.css('#HERO_PHOTO::attr(src)').extract_first()

and

response.css('#HERO_PHOTO').xpath('@src').extract_first()

and

response.css('#HERO_PHOTO[src]').extract_first()

is giving us

https://static.tacdn.com/img2/x.gif

Using .extract() also returned the same incorrect URL.

Why is Scrapy grabbing a different SRC value?

Nyxynyx
  • 61,411
  • 155
  • 482
  • 830

2 Answers2

3

The image links are in the page, but not directly as <img> tags. There are indeed processed with some JavaScript code. There is a JavaScript snippet inside the HTML with the image links you want (reformatted a bit):

...
}(window,ta));
</script>
<script type="text/javascript">
var lazyImgs = [{
    "data": "//maps.google.com/maps/api/staticmap?&channel=ta.desktop&zoom=15&size=340x225&client=gme-tripadvisorinc&sensor=falselanguageParam&center=45.503395,-73.573174&maptype=roadmap&&markers=icon:http%3A%2F%2Fc1.tacdn.com%2Fimg2%2Fmaps%2Ficons%2Fpin_v2_CurrentCenter.png|45.503395,-73.57317&signature=FqI7Z1egbpsVrlEE0yjw9HmsMJ8=",
    "scroll": false,
    "tagType": "img",
    "id": "lazyload_1098682971_0",
    "priority": 500,
    "logerror": false
}, {
    "data": "//ad.atdmt.com/i/img;p=11007200799198;cache=?ord=1475487471489",
    "scroll": false,
    "tagType": "img",
    "id": "lazyload_1098682971_1",
    "priority": 1000,
    "logerror": false
}, {
    "data": "//ad.doubleclick.net/ad/N4764.TripAdvisor/B7050081;sz=1x1?ord=1475487471489",
    "scroll": false,
    "tagType": "img",
    "id": "lazyload_1098682971_2",
    "priority": 1000,
    "logerror": false
}, {
    "data": "https://static.tacdn.com/img2/maps/icons/spinner24.gif",
    "scroll": false,
    "tagType": "img",
    "id": "lazyload_1098682971_3",
    "priority": 100,
    "logerror": false
}, {
    "data": "https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg",
    "scroll": false,
    "tagType": "img",
    "id": "HERO_PHOTO",
    "priority": 100,
    "logerror": false
}, {
    "data": "https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/98/montreal-night-tour.jpg",
    "scroll": false,
    "tagType": "img",
    "id": "THUMB_PHOTO1",
    "priority": 100,
    "logerror": false
}, {
    "data": "https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/8f/montreal-night-tour.jpg",
    "scroll": false,
    "tagType": "img",
    "id": "THUMB_PHOTO2",
    "priority": 100,
    "logerror": false
}, {
    "data": "https://static.tacdn.com/img2/generic/site/no_user_photo-v1.gif",
    "scroll": false,
    "tagType": "img",
    "id": "lazyload_1098682971_4",
    "priority": 100,
    "logerror": false
}...

One way to parse this is to use js2xml:

from pprint import pprint
# get all `<script>`s content 
for js in response.xpath('.//script[@type="text/javascript"]/text()').extract():
    try:
        jstree = js2xml.parse(js)

        # look for assignment of `var lazyImgs`
        for imgs in jstree.xpath('//var[@name="lazyImgs"]/*'):

            # use js2xml.make_dict() -- poor name I know
            # to build a useful Python object
            data = js2xml.make_dict(imgs)

            pprint(data)

            break

    except Exception as e:
        pass

This is what you get out:

[{'data': '//maps.google.com/maps/api/staticmap?&channel=ta.desktop&zoom=15&size=340x225&client=gme-tripadvisorinc&sensor=falselanguageParam&center=45.503395,-73.573174&maptype=roadmap&&markers=icon:http%3A%2F%2Fc1.tacdn.com%2Fimg2%2Fmaps%2Ficons%2Fpin_v2_CurrentCenter.png|45.503395,-73.57317&signature=FqI7Z1egbpsVrlEE0yjw9HmsMJ8=',
  'id': 'lazyload_-1977833463_0',
  'logerror': False,
  'priority': 500,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/maps/icons/spinner24.gif',
  'id': 'lazyload_-1977833463_1',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg',
  'id': 'HERO_PHOTO',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/98/montreal-night-tour.jpg',
  'id': 'THUMB_PHOTO1',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/8f/montreal-night-tour.jpg',
  'id': 'THUMB_PHOTO2',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/generic/site/no_user_photo-v1.gif',
  'id': 'lazyload_-1977833463_2',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/08/38/19/cb/gayle-h.jpg',
  'id': 'lazyload_-1977833463_3',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_01.png',
  'id': 'lazyload_-1977833463_4',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_02.png',
  'id': 'lazyload_-1977833463_5',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_6',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_7',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/b1/32/93/holidays1958.jpg',
  'id': 'lazyload_-1977833463_8',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_04.png',
  'id': 'lazyload_-1977833463_9',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_04.png',
  'id': 'lazyload_-1977833463_10',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_11',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_12',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_13',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-o/06/4d/bc/f6/disneybus.jpg',
  'id': 'lazyload_-1977833463_14',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_06.png',
  'id': 'lazyload_-1977833463_15',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_06.png',
  'id': 'lazyload_-1977833463_16',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_17',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_18',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_19',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/a7/avatar078.jpg',
  'id': 'lazyload_-1977833463_20',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_01.png',
  'id': 'lazyload_-1977833463_21',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_22',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_23',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/9f/avatar070.jpg',
  'id': 'lazyload_-1977833463_24',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_02.png',
  'id': 'lazyload_-1977833463_25',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_03.png',
  'id': 'lazyload_-1977833463_26',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_27',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_28',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/03/9f/a6/94/facebook-avatar.jpg',
  'id': 'lazyload_-1977833463_29',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_04.png',
  'id': 'lazyload_-1977833463_30',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_05.png',
  'id': 'lazyload_-1977833463_31',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_32',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_33',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_34',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/06/f3/32/86/complsv.jpg',
  'id': 'lazyload_-1977833463_35',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_04.png',
  'id': 'lazyload_-1977833463_36',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_05.png',
  'id': 'lazyload_-1977833463_37',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_38',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_39',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_40',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/05/f2/4d/68/christine-n.jpg',
  'id': 'lazyload_-1977833463_41',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_03.png',
  'id': 'lazyload_-1977833463_42',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_04.png',
  'id': 'lazyload_-1977833463_43',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_44',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_45',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_46',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/80/avatar001.jpg',
  'id': 'lazyload_-1977833463_47',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_03.png',
  'id': 'lazyload_-1977833463_48',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_04.png',
  'id': 'lazyload_-1977833463_49',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_50',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_51',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_52',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/0a/45/46/e2/tracey-g.jpg',
  'id': 'lazyload_-1977833463_53',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/lvl_06.png',
  'id': 'lazyload_-1977833463_54',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/rev_06.png',
  'id': 'lazyload_-1977833463_55',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
  'id': 'lazyload_-1977833463_56',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
  'id': 'lazyload_-1977833463_57',
  'logerror': False,
  'priority': 100,
  'scroll': False,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
  'id': 'lazyload_-1977833463_58',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-f/02/6d/40/b2/montreal-amphi-bus-tour.jpg',
  'id': 'lazyload_-1977833463_59',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/39/2d/43/old-montreal-walking.jpg',
  'id': 'lazyload_-1977833463_60',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/06/df/96/c7/excursions-montreal-private.jpg',
  'id': 'lazyload_-1977833463_61',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/02/ad/57/0a/filename-p1010076-jpg.jpg',
  'id': 'lazyload_-1977833463_62',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-o/04/b5/6a/8d/ali-l.jpg',
  'id': 'lazyload_-1977833463_63',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/87/avatar008.jpg',
  'id': 'lazyload_-1977833463_64',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-o/06/8a/c5/7d/leonard-d.jpg',
  'id': 'lazyload_-1977833463_65',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-o/05/6d/32/ca/rpm13111.jpg',
  'id': 'lazyload_-1977833463_66',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/87/avatar008.jpg',
  'id': 'lazyload_-1977833463_67',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/neighborhood/icon_hood_white.png',
  'id': 'lazyload_-1977833463_68',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/oyster/500/08/5b/34/b0/sherbrooke-street-west-shopping--.jpg',
  'id': 'lazyload_-1977833463_69',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/maps/icons/icon_mapControl_expand_idle_30x30.png',
  'id': 'lazyload_-1977833463_70',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/maps/icons/icon_mapControl_expand_hover_30x30.png',
  'id': 'lazyload_-1977833463_71',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/a1/f2/6b/marche-atwater.jpg',
  'id': 'lazyload_-1977833463_72',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/41/78/a3/mcgill-university-lower.jpg',
  'id': 'lazyload_-1977833463_73',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/04/06/16/08/musee-grevin.jpg',
  'id': 'lazyload_-1977833463_74',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/03/4a/9a/85/laurie-raphael.jpg',
  'id': 'lazyload_-1977833463_75',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/09/45/53/16/cafe-humble-lion.jpg',
  'id': 'lazyload_-1977833463_76',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://media-cdn.tripadvisor.com/media/photo-l/03/2f/37/03/essence.jpg',
  'id': 'lazyload_-1977833463_77',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/branding/logo_with_tagline.png',
  'id': 'LOGOTAGLINE',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'},
 {'data': 'https://static.tacdn.com/img2/icons/bell.png',
  'id': 'lazyload_-1977833463_78',
  'logerror': False,
  'priority': 100,
  'scroll': True,
  'tagType': 'img'}]
paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
0

I believe you are using the wrong css selector. Looking at w3 schools it seems to select your attribute you want [src].

Try this.

response.css('#HERO_PHOTO[src]').extract_first()

my next suggestion is to see what you get without using the extract_first(). See if it's in the return value of response.css('#HERO_PHOTO[src]')

EDIT: I think the issue you're experiencing is you are querying the page source, not the rendered html. Here's a link to what I believe is happening.

This Questions first answer

You are querying what the server had responded, not what JavaScript has had a chance to manipulate.

Community
  • 1
  • 1
ddeamaral
  • 1,403
  • 2
  • 28
  • 43
  • `response.css('#HERO_PHOTO[src]').extract()` gets me `[u'Photo of PHI Centre']` – Nyxynyx Oct 02 '16 at 13:55
  • are you able to post the html you're trying to extract from? – ddeamaral Oct 02 '16 at 13:57
  • This is the page https://www.tripadvisor.com/Attraction_Review-g155032-d1494256-Reviews-Gray_Line-Montreal_Quebec.html. The HTML chunk I am targeting is in the question. – Nyxynyx Oct 02 '16 at 13:58
  • There doesn't seem to be what you're looking for in the page source. – ddeamaral Oct 02 '16 at 14:02
  • Modified my answer – ddeamaral Oct 02 '16 at 14:08
  • Will using Selenium with Scrapy be a good solution for this? This is my first time using Scrapy. I have never used Selenium, but used quite a lot of PhantomJS with NodeJS – Nyxynyx Oct 02 '16 at 14:11
  • I'm not sure. I am not too familiar with Selenium, or how it works under the hood. I would poke around some forums to find out if it or any other scraping libraries actually can scrape the current live page, not just the server response. Not sure how scraping libraries parse pages under the hood. Might be able to read through the documentation to find out if any do use a live rendering of the page after javascript has had a chance to execute. – ddeamaral Oct 02 '16 at 14:14