3

An awful lot of modern web traffic (particularly on social media) consists of screenshots from web browsers. These typically include some formatted text, some layout, and some bitmap/vector graphics. E.g.,

https://www.reddit.com/r/BlackPeopleTwitter/comments/5863xx/pray_4_orlando/

It's really easy to take and share a screenshot, but it throws away lots of useful information and doesn't transfer well between devices (not to mention being far less amenable to things like screen readers for the blind and fancy data-mining). Of course the ironic part of this is that HTML/SVG is the perfect format for representing such data, and we're not using it even though it's right there.

html2canvas comes close to doing this, but doesn't properly handle images, see some semi-related discussion here.

My question is this, how can I select a visible area in my browser and save it in a format (ideally HTML) that preserves text and images and renders to something roughly similar when rendered separately? (so that it could be included as e.g. a data iframe for sharing).

I know that this is in general impossible, and that rendering HTML is a complicated task, but I feel like it should be possible to ask the browser something like "what elements are being rendered within these pixel coordinates?".

Community
  • 1
  • 1
Sean D
  • 381
  • 3
  • 14
  • 1
    Possible lead : [Tools to selectively copy HTML+CSS+JS..](http://stackoverflow.com/q/4911338/5496966) – AA2992 Nov 02 '16 at 09:58
  • Thanks for the link @AnkithAmtange, I had a look and it seems that all those tools just select DOM elements, so you'd have to do something on top of that to get the "screenshot", but it's a good start. – Sean D Nov 02 '16 at 22:54
  • PhantomJS can do this (see http://phantomjs.org/screen-capture.html). You would have to wrap it in an extension somehow, although I don't think something like this would be very useful. Images render the same on any device, and if you want to extract the text then OCR tools can easily do the job. I ran your image through http://www.onlineocr.net/ and got most of the text. – Dan Wilson Nov 06 '16 at 13:41
  • @user3608792 , Good catch. I think I can take the pdf output of `phantomjs` and convert it to svg. I'd normally try to admonish you for unhelpfully telling me the idea wasn't good, but you solved my problem, so thanks very much. – Sean D Nov 07 '16 at 17:34
  • This seems unrelated to programming. Maybe inspect element and remove all elements but the ones that highlight what you want to keep. –  Nov 08 '16 at 04:17
  • @AllDani.com perhaps there could be some sort of formal way to instruct the computer to perform such a task? – Sean D Nov 09 '16 at 22:54
  • What task, exactly? –  Nov 09 '16 at 23:52
  • @AllDani.com, the task of, for example, removing unwanted elements from the DOM. While you claim that this is unrelated to programming, I would suggest that this is a perfectly good problem to solve by writing a programme, and was asking how one would go about it. You'll see the above comments are helping with this. – Sean D Nov 10 '16 at 17:53
  • Please see [mcve]. SO doesn't make code from scratch for people. rather, SO helps fix existing code. –  Nov 10 '16 at 21:01
  • Jesus man. That applies to "When asking a question about a problem caused by your code". I don't want someone to write code, I want someone to describe a solution, as I have explained. It's fine if you don't know how to do that or don't want to. – Sean D Nov 11 '16 at 08:54

1 Answers1

1

First:

  • Right click on page, then click on "Save page as".
  • Save it with a name that ends with .html (or .webarchive in some scenarios. See which works best for you).
  • Edit the now saved html file to only have the part you want (you can use any text editor. Sublime Text and Atom are usually suggested).

Then:

  • You can open it in your browser to see what you are up to.
  • You might want to inspect where the CSS is from too, and get that in your html's file folder, then link the html file to it, so as to preserve the styles.

As far as I understand, you'd want to bring all the CSS to be inline, or, at least, in the <head> section of the html file, so you can upload it as a single file, and don't need to keep linking it to the CSS file.

  • I'm sorry if I wrote the question in an unclear way, but this is not at all what I meant. – Sean D Nov 10 '16 at 17:50
  • @SeanD This adequately answers the question as-written. Please clarify how this does not address your question in your opinion. – Sean Apr 30 '21 at 04:49