why python response selectors can only get part of the actual elements

Question

I am using the python scrapy, try to get the car-name on the page:

https://youjia.baidu.com/view/carDatabase?title=%E7%8E%B0%E4%BB%A3&key=code&val=174&sa=pc_growth_1

but when I use below method

len(response.css('p.car-name::text').getall())

I can only get 25 out of the actual 46 elements. I've examined the page source carefully, and using the selenium middleware. Why? Anybody who get a clue?

the complete source is put here

https://github.com/sampan0423/youjia1

It may be about `infinite scroll` which loads new items as you scroll. When you first open the page, it loads 25 but depending on your screen size it begins to load new ones. Scraping may not be able to get the new items. You can inspect requests via `Network -> XHR` tab of your developer console if you are using Chrome (or similar). — Harun Yilmaz, May 17 '21 at 03:44
You can also use the direct URL to API. For example `https://youjia.baidu.com/conditionsearch?token=1_526c1239fc0b0512a2bd13ac6b962f5f&sort=4&brand=174&level=&country=&price=&rn=50` You can modify `rn` parameter (result number possibly) to get as much results as you want in JSON form. — Harun Yilmaz, May 17 '21 at 03:50
@HarunYilmaz Yeah, your answer is very good. But I don't know how to use them. "infinite scroll" & "direct URL" are quite new for me. I am using the Scrapy framework. for this reason, maybe I can integrate "infinite scroll" into scrapy. — sampan0423, May 17 '21 at 06:49
Yeah, your answer is very good. But I don't know how to use them. "infinite scroll" & "direct URL" are quite new for me. "direct URL" seems like a pure JSON file, right? I need to learn more to get the method extracting those info I need. as between different brands, the token/sort is not changing, so this might be a easier way. the other hand, I am using the Scrapy framework. for this reason, maybe I can integrate "infinite scroll" into scrapy. I've searched this, maybe I need to use selenium to do some scroll_down on pages. @Harun Yilmaz — sampan0423, May 17 '21 at 07:10

score 0 · Answer 1 · answered May 18 '21 at 10:45

0

sampan, you can write code in Scrapy to deal with load more, you don't need Selenium.

Provided you can keep accessing the JSON via the API

See previous post : Scrapy Extract ld+JSON

answered May 18 '21 at 10:45

Dr Pi

417
3
9

why python response selectors can only get part of the actual elements

1 Answers1