Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of web-scraping: Let's [scrape] these tags off the bottom of our shoe

349 questions

106

votes

3 answers

XPath:: Get following Sibling

I have following HTML Structure: I am trying to build a robust method to extract second color digest element since there will be many of these tag within the DOM. …

asked Jul 25 '12 at 19:33

add-semi-colons

18,094
55
145
232

votes

7 answers

crawler vs scraper

Can somebody distinguish between a crawler and scraper in terms of scope and functionality.

web-crawler terminology scraper

asked Jul 08 '10 at 19:56

Nayn

3,594
8
38
48

votes

8 answers

BeautifulSoup: extract text from anchor tag

I want to extract: text from following src of the image tag and text of the anchor tag which is inside the div class data I successfully manage to extract the img src, but am having trouble extracting the text from the anchor tag.

python html beautifulsoup tags scraper

asked Jul 30 '12 at 06:32

add-semi-colons

18,094
55
145
232

votes

3 answers

How to scrape a website that requires login first with Python

First of all, I think it's worth saying that, I know there are a bunch of similar questions but NONE of them works for me... I'm a newbie on Python, html and web scraper. I'm trying to scrape user information from a website which needs to login…

python http cookies authorization scraper

asked Nov 18 '13 at 03:35

user2830451

2,126
5
25
31

votes

3 answers

scrape websites with infinite scrolling

I have written many scrapers but I am not really sure how to handle infinite scrollers. These days most website etc, Facebook, Pinterest has infinite scrollers.

python screen-scraping scraper

asked Sep 20 '12 at 18:56

add-semi-colons

18,094
55
145
232

votes

3 answers

How to use Selenium Webdriver on Heroku?

I am developing a Node.js app, and I use Selenium Webdriver on it for scraping purposes. However, when I deploy on Heroku, Selenium doesn't work. How can I make Selenium work on Heroku?

node.js selenium heroku webdriver scraper

asked Mar 17 '17 at 14:56

Athanasios Canko

votes

5 answers

Facebook meta tags scraped with locale not working

My website is multi-language and I have a FB like button. I'd like to have the like posts in different languages. According to Facebook documentation, if I use the meta tag og:locale and og:locale:alternate, the scraper would get my site info…

facebook facebook-like locale scraper

asked Sep 30 '11 at 18:34

Alouw Net

votes

5 answers

BeautifulSoup: Strip specified attributes, but preserve the tag and its contents

I'm trying to 'defrontpagify' the html of a MS FrontPage generated website, and I'm writing a BeautifulSoup script to do it. However, I've gotten stuck on the part where I try to strip a particular attribute (or list attributes) from every tag in…

python web-scraping beautifulsoup scraper frontpage

asked Jan 28 '12 at 09:03

bgibson

17,379
8
29
45

votes

2 answers

Facebook scraper doesn't load dynamic meta-tags

I am creating the HTML meta-tags dynamically using the function below (GWT). It takes 1 second to have this on the DOM. It is working fine except for Facebook. When I share a link from my web, the scraper gets the meta-tags that are in the HTML:…

html facebook web-scraping meta-tags scraper

asked Feb 15 '13 at 16:08

user411103

votes

1 answer

Crawling LinkedIn while authenticated with Scrapy

So I've read through the Crawling with an authenticated session in Scrapy and I am getting hung up, I am 99% sure that my parse code is correct, I just don't believe the login is redirecting and being successful. I also am having an issue with the…

python linkedin-api scrapy scraper

asked Jun 08 '12 at 18:16

Gates

votes

2 answers

Scrapy Body Text Only

I am trying to scrape the text only from body using python Scrapy, but haven't had any luck yet. Wishing some scholars might be able to help me here scraping all the text from the tag.

python scrapy scrape scraper

asked Mar 22 '11 at 10:59

mmrs151

3,924
2
34
38

votes

7 answers

Print Python output by PHP Code

I have a scraper which scrape one site (Written in python). While scraping the site, that print lines which are about to write in CSV. Scraper has been written in Python and now I want to execute it via PHP code. My question is how can I print…

php python scraper

asked Dec 09 '12 at 11:19

Rajiv Pingale

votes

3 answers

How to crawl with php Goutte and Guzzle if data is loaded by Javascript?

Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery)

php web-crawler guzzle scraper goutte

asked Apr 17 '16 at 07:04

Batman

votes

2 answers

Can't get Scrapy pipeline to work

I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py: class FilePipeline(object): def __init__(self): self.file =…

python web-crawler pipeline scrapy scraper

asked Nov 03 '10 at 19:21

Jim Jeffries

9,841
15
62
103

votes

2 answers

XPath recursive children selection

I'm using scrapy to extract data from a web site, but I have a problem with the XPath selector, assuming i have this HTML code:

Hi!

I am a child!

I am a span…

html xpath scrapy scraper

asked Sep 17 '13 at 21:24

bukk530

1,858
2
20
30

2 3

…

23 24 Next