1

I have a really simple python script on scraperwiki:

import scraperwiki
import lxml.html

html = scraperwiki.scrape("http://www.westphillytools.org/toolsListing.php")
print html

I haven't written anything to parse it yet... for now I just want the html.

When I run it in edit mode it works perfectly.

When a scheduled scrape runs (or I manually run it), it omits dozens (or even hundreds) of lines.

It's a very small webpage so data overload shouldn't be a problem. Any ideas?

maneesha
  • 685
  • 3
  • 11
  • 19
  • Are you sure it's not an artefact of how printing is handled on scraperwiki? – Marcin Mar 07 '12 at 14:39
  • not sure... I get a line in the middle of my html output that reads like this - the actual numbers vary each time (brackets included): [53 lines, 159000 characters omitted] – maneesha Mar 07 '12 at 14:43
  • can't find anything on scraperwiki documentation about it – maneesha Mar 07 '12 at 14:44
  • interesting! did you have a need for the output in some way, or are you just curious as to how ScraperWiki works and when it truncates it? – frabcus Mar 07 '12 at 16:16

2 Answers2

0

It sounds like the data are there in your variable. Try printing it a line at a time.

Marcin
  • 48,559
  • 18
  • 128
  • 201
0

In the editor, individual print statements are rolled up into one line for display. You can click "more..." in the console on the editor to view the whole lot.

When run scheduled, it's just output exactly like in any console. So if there are carriage returns in the HTML, you'll get lots of lines of output.

To reduce the amount of output we store, we truncate large outputs from scheduled runs. That's where you've seen "[53 lines, 159000 characters omitted]".

It's not really intended that stdout from scheduled runs is for anything other than debugging. You need to save to the datastore for output you want to use.

frabcus
  • 919
  • 1
  • 7
  • 18
  • thanks... I didn't know that you couldn't just store the entire html. – maneesha Mar 08 '12 at 13:47
  • Not sure what you mean by store... the stored stdout from a scheduled run is meant to just be for debugging. You can store other stuff in the SQLite database... – frabcus Mar 09 '12 at 15:03