Questions tagged [scraperwiki]

ScraperWiki was an online tool for Screen Scraping.

ScraperWiki ScraperWiki was a platform for writing and scheduling screen scrapers, and for storing the data they generate. It support Ruby, Python and PHP. A later version of the service was called QuickCode, which has also been decommissioned.

"Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.

68 questions

votes

15 answers

How to install Poppler on Windows?

The most recent version of ScraperWiki depends on Poppler (or so the GitHub says). Unfortunately, it only specifies how to get it on macOS and Linux, not Windows. A quick googling turned up nothing too promising. Does anyone know how to get Poppler…

asked Aug 22 '13 at 13:22

akkatracker

1,397
3
14
25

votes

3 answers

Encoding error while parsing RSS with lxml

I want to parse downloaded RSS with lxml, but I don't know how to handle with UnicodeDecodeError? request = urllib2.Request('http://wiadomosci.onet.pl/kraj/rss.xml') response = urllib2.urlopen(request) response = response.read() encd =…

python rss lxml scraperwiki chardet

asked Apr 27 '11 at 23:44

domi

votes

1 answer

Can I install the "scraperwiki" library locally?

Is the scraperwiki python module available for install outside of the Scraperwiki.com web interface? It looks like the source is available, but not packaged.

python scraperwiki

asked Mar 26 '12 at 21:06

Amanda

12,099
17
63
91

votes

2 answers

Scraperwiki + lxml. How to get the href attribute of a child of an element with a class?

On the link that contains 'alpha' in the URL has many links (hrefs) which I would like to collect from 20 different pages and paste onto the end of the general url (second last line). The href are found in a table which class is mys-elastic mys-left…

python web-scraping lxml scraperwiki

asked Jan 02 '13 at 09:30

Patrick Artounian

votes

2 answers

Screenscaping aspx with Python Mechanize - Javascript form submission

I'm trying to scrape UK Food Ratings Agency data aspx seach results pages (e.,g http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ) using Mechanize/Python on scraperwiki ( http://scraperwiki.com/scrapers/food_standards_agency/ ) but coming up with…

asp.net python mechanize scraperwiki

asked May 24 '11 at 19:49

psychemedia

5,690
7
52
84

votes

1 answer

How does scraperwiki limit the execution time?

How does scraperwiki decides to stop a scheduled run? Is it based on the actual execution time or the CPU time ? Or maybe something else. I scrape a site for which Mechanize requires 30s to load every page but I use very few CPU to process the…

scraperwiki

asked May 20 '11 at 07:30

Christophe

votes

1 answer

What does "exit status 1" mean in ScraperWiki, is it a failure?

A user was getting this message from a scraper run. Run succeeded: - ran 1 times, most recently for 2073 seconds (288 scraped pages, 2 records) 17:45, 5 May 2011 Hide Details …

screen-scraping scraperwiki

asked May 06 '11 at 01:37

frabcus

votes

1 answer

ScraperWiki: How to create and add records with autoincrement key

Anyone know how to create a table with a surrogate key? looking for something like autoincrement, that is just a large integer that automatically adds the next highest unique number as the primary key. Need to know how to create the table as well…

sql sqlite auto-increment scraperwiki

asked Mar 01 '12 at 02:39

Dragon

2,017
1
19
35

votes

1 answer

Does ScraperWiki rate limit sites it is scraping?

Does ScraperWiki somehow automatically rate limit scraping, or should I add something like sleep(1 * random.random()) to the loop?

screen-scraping scraperwiki

asked May 01 '11 at 11:07

frabcus

votes

2 answers

TypeError: must be convertible to a buffer, not ResultSet

I am trying to convert a PDF into a text file using scraperwiki and bs4. I am getting a TypeError. I am very new at Python and would really appreciate a little assistance. Error occurs here: File "scraper_wiki_download.py", line 53, in…

python beautifulsoup scraperwiki

asked May 16 '16 at 10:07

tonestrike

votes

1 answer

Scraperwiki - python - skipping a table row

I'm trying to scrape a table that uses TH as a leading column element with a following TD tag. The problem is that the table uses intermittent dividers that need to be skipped because they don't contain a TH tag. This is a sample from the…

python-2.7 web-scraping scraperwiki

asked May 14 '14 at 22:10

woodbine

votes

2 answers

Scrape a Google Chart script with Scraperwiki (Python)

I'm just getting into scraping with Scraperwiki in Python. Already figured out how to scrape tables from a page, run the scraper every month and save the results on top of each other. Pretty cool. Now I want to scrape this page with information on…

python web-scraping scraperwiki

asked May 04 '13 at 11:14

Jerry Vermanen

votes

1 answer

Python scraper (Scraperwiki) only getting half of the table

I'm learning how to write scrapers using Python in Scraperwiki. So far so good, but I have spent a couple of days scratching my head now over a problem I can't get my head around. I am trying to take all links from a table. It works, but from the…

python web-scraping lxml scraperwiki

asked Jan 29 '13 at 17:32

cptasker

vote

2 answers

Why would scraperwiki omit lines from scraped html?

I have a really simple python script on scraperwiki: import scraperwiki import lxml.html html = scraperwiki.scrape("http://www.westphillytools.org/toolsListing.php") print html I haven't written anything to parse it yet... for now I just want the…

python html lxml scraperwiki

asked Mar 07 '12 at 14:25

maneesha

vote

2 answers

Parsing a numbered transcript into XML

I'm wanting to build a scraper that parses through transcripts from the Leveson Inquiry, which are in the following format as plaintext: 1 Thursday, 2 February 2012 2 (10.00 am) 3 …

php xml regex web-scraping scraperwiki

asked Feb 20 '12 at 19:43

aendra

5,286
3
38
57

2 3 4 5 Next