Questions tagged [scraperwiki]

ScraperWiki was an online tool for Screen Scraping.

ScraperWiki ScraperWiki was a platform for writing and scheduling screen scrapers, and for storing the data they generate. It support Ruby, Python and PHP. A later version of the service was called QuickCode, which has also been decommissioned.

"Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.

68 questions
61
votes
15 answers

How to install Poppler on Windows?

The most recent version of ScraperWiki depends on Poppler (or so the GitHub says). Unfortunately, it only specifies how to get it on macOS and Linux, not Windows. A quick googling turned up nothing too promising. Does anyone know how to get Poppler…
akkatracker
  • 1,397
  • 3
  • 14
  • 25
9
votes
3 answers

Encoding error while parsing RSS with lxml

I want to parse downloaded RSS with lxml, but I don't know how to handle with UnicodeDecodeError? request = urllib2.Request('http://wiadomosci.onet.pl/kraj/rss.xml') response = urllib2.urlopen(request) response = response.read() encd =…
domi
  • 189
  • 1
  • 4
  • 9
7
votes
1 answer

Can I install the "scraperwiki" library locally?

Is the scraperwiki python module available for install outside of the Scraperwiki.com web interface? It looks like the source is available, but not packaged.
Amanda
  • 12,099
  • 17
  • 63
  • 91
7
votes
2 answers

Scraperwiki + lxml. How to get the href attribute of a child of an element with a class?

On the link that contains 'alpha' in the URL has many links (hrefs) which I would like to collect from 20 different pages and paste onto the end of the general url (second last line). The href are found in a table which class is mys-elastic mys-left…
6
votes
2 answers

Screenscaping aspx with Python Mechanize - Javascript form submission

I'm trying to scrape UK Food Ratings Agency data aspx seach results pages (e.,g http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ) using Mechanize/Python on scraperwiki ( http://scraperwiki.com/scrapers/food_standards_agency/ ) but coming up with…
psychemedia
  • 5,690
  • 7
  • 52
  • 84
4
votes
1 answer

How does scraperwiki limit the execution time?

How does scraperwiki decides to stop a scheduled run? Is it based on the actual execution time or the CPU time ? Or maybe something else. I scrape a site for which Mechanize requires 30s to load every page but I use very few CPU to process the…
Christophe
  • 41
  • 1
4
votes
1 answer

What does "exit status 1" mean in ScraperWiki, is it a failure?

A user was getting this message from a scraper run. Run succeeded: - ran 1 times, most recently for 2073 seconds (288 scraped pages, 2 records) 17:45, 5 May 2011 Hide Details …
frabcus
  • 919
  • 1
  • 7
  • 18
2
votes
1 answer

ScraperWiki: How to create and add records with autoincrement key

Anyone know how to create a table with a surrogate key? looking for something like autoincrement, that is just a large integer that automatically adds the next highest unique number as the primary key. Need to know how to create the table as well…
Dragon
  • 2,017
  • 1
  • 19
  • 35
2
votes
1 answer

Does ScraperWiki rate limit sites it is scraping?

Does ScraperWiki somehow automatically rate limit scraping, or should I add something like sleep(1 * random.random()) to the loop?
frabcus
  • 919
  • 1
  • 7
  • 18
2
votes
2 answers

TypeError: must be convertible to a buffer, not ResultSet

I am trying to convert a PDF into a text file using scraperwiki and bs4. I am getting a TypeError. I am very new at Python and would really appreciate a little assistance. Error occurs here: File "scraper_wiki_download.py", line 53, in…
tonestrike
  • 320
  • 6
  • 22
2
votes
1 answer

Scraperwiki - python - skipping a table row

I'm trying to scrape a table that uses TH as a leading column element with a following TD tag. The problem is that the table uses intermittent dividers that need to be skipped because they don't contain a TH tag. This is a sample from the…
woodbine
  • 553
  • 6
  • 26
2
votes
2 answers

Scrape a Google Chart script with Scraperwiki (Python)

I'm just getting into scraping with Scraperwiki in Python. Already figured out how to scrape tables from a page, run the scraper every month and save the results on top of each other. Pretty cool. Now I want to scrape this page with information on…
Jerry Vermanen
  • 297
  • 1
  • 2
  • 19
2
votes
1 answer

Python scraper (Scraperwiki) only getting half of the table

I'm learning how to write scrapers using Python in Scraperwiki. So far so good, but I have spent a couple of days scratching my head now over a problem I can't get my head around. I am trying to take all links from a table. It works, but from the…
cptasker
  • 47
  • 8
1
vote
2 answers

Why would scraperwiki omit lines from scraped html?

I have a really simple python script on scraperwiki: import scraperwiki import lxml.html html = scraperwiki.scrape("http://www.westphillytools.org/toolsListing.php") print html I haven't written anything to parse it yet... for now I just want the…
maneesha
  • 685
  • 3
  • 11
  • 19
1
vote
2 answers

Parsing a numbered transcript into XML

I'm wanting to build a scraper that parses through transcripts from the Leveson Inquiry, which are in the following format as plaintext: 1 Thursday, 2 February 2012 2 (10.00 am) 3 …
aendra
  • 5,286
  • 3
  • 38
  • 57
1
2 3 4 5