7

We offer browser-page JavaScript similar to imagemagick that helps people convert images to different sizes and formats. However, it requires webpage interaction.

Is it possible to let people automate this interaction -- without sending images to our server (thus increasing bandwidth cost and server load) and without requiring users to download a headless browser library like Puppeteer?

For instance, is the following flow possible:

  1. Open Chrome via the command line (or local script) to a specific web page.
  2. Upload an image to that web page.
  3. Invoke a script on the web page.
  4. Receive the script results and allow for local manipulation.

Launching Chrome is possible, but it's unclear if you can interact with a specific browser window after launching it.

Sheepy
  • 17,324
  • 4
  • 45
  • 69
Crashalot
  • 33,605
  • 61
  • 269
  • 439
  • @Bauke sorry, will revise and clarify the question. needs to happen client-side with no expectation the user can/will download an additional script. – Crashalot Nov 04 '19 at 17:31
  • Can you add more information ? What you are trying to achieve is very unclear. It look to my that you are trying to create a browser extension to convert images? – Nicolas Nov 04 '19 at 17:32
  • @Nicolas sorry for the confusion. No, not a browser extension. The goal is to let devs hook into this script while minimizing load on our servers/bandwidth. For instance, one approach is to port this browser JavaScript to the server and expose it via an API, but that means our server gets hit with every conversion. Ideally, we allow users to use this script while somehow bypassing our server (beyond the page load). – Crashalot Nov 04 '19 at 17:37
  • What is the context to hook to this script, is it a Web page, do we import it via a `script` tag ? – Nicolas Nov 04 '19 at 17:50
  • The ideal context is as described in the question, but conceptually, the goal is to let devs reuse this image code (so they don't need to write their own or deal with imagemagick) without burdening our servers so feel free to suggest other contexts if they achieve the conceptual goal. @Nicolas – Crashalot Nov 04 '19 at 18:10
  • Why would you want to do command line work inside a browser? Why are you not working with node.js instead? – MauriceNino Nov 12 '19 at 13:26
  • @MauriceNino good questions. because the goal is to provide a free service to developers so they don't need to (1) roll their own or (2) fork/manage extra code. any suggestions? – Crashalot Nov 13 '19 at 00:13
  • Provide your code via a plain `.js` script that you serve. The developers can include it with a script tag in their site or download and use it locally. The only thing you need to do is to provide the script. – MauriceNino Nov 13 '19 at 07:27
  • You can also provide it as a npm module. Then the users can conveniently integrate your code into their products. If you want to make sure that users can use it from the CLI, you can provide an interface like this: https://blog.bitsrc.io/how-to-build-a-command-line-cli-tool-in-nodejs-b8072b291f81 – MauriceNino Nov 13 '19 at 07:46
  • those are both good suggestions, but they still require devs patching the code on updates. maybe we'll just bite the cost on bandwidth charges but force processing to happen client side in the browser. – Crashalot Nov 13 '19 at 09:23
  • How does the first suggestion force devs to update anything? If you just provide a js file and devs include this file via ` – MauriceNino Nov 13 '19 at 12:05
  • @MauriceNino great point about the js file, though, it does mean we have to maintain the same API or code may break. the website wasn't mentioned to avoid perceptions of potential spam. happy to privately share more information, though -- okay to contact you? – Crashalot Nov 13 '19 at 21:25
  • You should be able to interact with chrome headless... but if you don't want them to bundle puppeteer, then you have to write the interactions by hand yourself... which is even worse – Christopher Francisco Nov 14 '19 at 00:54
  • That is the same for a CLI program. You need to maintain the API, or any program will break. Not just for plain JS files. Sure write me. – MauriceNino Nov 14 '19 at 08:17
  • @MauriceNino no contact info in your profile? – Crashalot Nov 14 '19 at 09:58
  • https://meta.stackexchange.com/questions/231544/how-to-start-chat-with-a-particular-user – MauriceNino Nov 14 '19 at 11:44
  • @MauriceNino just sent a message, thanks! – Crashalot Nov 15 '19 at 09:23

3 Answers3

5

Should be technically automate-able, but it is far from straightforward.

Your question can be split into two parts: offline processing and upload automation.


Offline Processing

Assuming your image processing code is fully in-browser JavaScript (instead of, say, a modularized node program calling native libraries), it is possible to do all the processing in-browser.

File "uploaded" can be read, processed, and downloaded without sending anything to server. The processing may even happens in a background thread, keeping the UI responsive, such as a nice progress bar.

The code itself can be hosted online using Service Worker, or static html + javascript. Both can be opened and executed offline, once visited or deployed. (Note that Chrome severely limits static html, including a harsh restriction on web workers. Google prefers you to keep things online.)


Upload Automation

As mentioned above, a file selected by file input or dropped into the browser can be read by in-page JavaScript, but I'll keep calling it an "upload" action in tradition.

Chrome has some automation extensions, most notably Kantu, but they can't handle file upload because of Chrome's security restriction.

So, if you want to automate file selection, you need to use a native, out-of-browser automation tool, such as Kantu's XModules, AutoHotkey, or SikuliX. Commercial solution exists, but with similar restrictions given your unusual requirements of no headless browser.

  • AutoHotkey will be focused on simulating keyboard (Open browser, wait 5 second, press tab 10 times, press enter, wait 2 sec, type file name, press enter, and so on), and can be compiled into a deployable exe.

  • Sikulix is more powerful, but is also much harder to distribute; just the java runtime is bigger than a browser.

  • Kantu + XModules is kind of between the two. The users will need to install the browser extension, and its native extension, but once done everything happens in the browser (more or less).

All three methods involve simulation of typing the file name, because as far as I know there is no simpler way to automate it in a user-launched (non-headless) Chrome.

Name of the image file can be passed as parameter to the command line for AutoHotkey and Sikulix, or stored in a file and read by the script in case of Kantu.

In all three cases, the automation simulates a user, and the real-life user must not touch the computer while the script is running, or the automation will break.


How about command line?

Alternatively, if your aim is automation without deploying a browser, you may consider making it a command line node.js program, and package it as exe.

The distributable would be heavier than a compiled AutoHotkey, but there are much less moving parts, and thus much more reliable:

  • Independent from Chrome version or the existence of XModules.
  • All processing happens in its own process, instead of hijacking the user's Chrome.
  • Can be executed headlessly, very important for automation.
  • Flexible command line parameters.

But I like browser automation, it is so simple

Think again.

From my experience, many things will throw Browser/GUI automation off:

  1. Unusual screen resolution, browser zoom, os scaling, or last remembered Chrome size that distort your page beyond recognition.
  2. Browser extensions that change page elements, such as ad-blockers.
  3. IMEs and other programs that intercept keyboard input with hotkeys.
  4. Popups programs, such as anti-virus, windows update, or inserting a CD.
  5. Accidental locks, sleeps, logouts, keys left on keyboard, or power interruption.
  6. Or a simple Chrome update that breaks any of the 100 things you depends on.

So, yeah, here are your reasons why computer automation is better done headless.


Will my code be safe?

In case you are worried about security of your script, don't worry. The moment you want the processing to happens on client-side, the cat is out.

Technically, your code is protected by copyright. But good luck enforcing it. If you want to keep your code out of extraction/decryption/unobfucation/whatever (cough), you need keep it an online blackbox, no client side processing.

Sheepy
  • 17,324
  • 4
  • 45
  • 69
  • thanks so much! not worried about code theft, just want to make this service available without additional server/bandwidth costs (beyond loading a page) and easier to consume than releasing the code open-source. unfortunately, it doesn't sound like this is possible as your solutions require extra downloads. once the page is loaded, the service should theoretically work offline as long as there's someway to interact with the page javascript. – Crashalot Nov 07 '19 at 21:45
  • @Crashalot Client processing seems to be the easy part in your case. It's the upload requirement that is problematic. Browsers are pretty secure now after decades of abuse, thus there is no way to read local image files (or cross site images) from within the browser. Chrome is experimenting with [Native File System API](https://web.dev/native-file-system/), but I consider it a very high risk feature so I advise against depending on it. There is also [Chrome App](https://developer.chrome.com/webstore/hosted_apps), but I lack experience in them. – Sheepy Nov 08 '19 at 03:25
  • yes, client processing is easy. the trick is transferring the original and converted images without incurring bandwidth costs. if you think of something else, please share. otherwise, thanks for your help so far! – Crashalot Nov 08 '19 at 04:26
  • @Crashalot Sorry we seems to be using different terms. Assuming your code is JavaScript, it is easy, almost trivial, to do them in browser. This [data uri converter](https://www.site24x7.com/tools/image-to-datauri.html) is an example. My answer has links to reading and outputting file in the Offline Processing part, as long as the file are loaded and saved by the user. Is this what you want to do? – Sheepy Nov 08 '19 at 08:59
  • yes the code is in JS, and it is trivial. unfortunately, there are resizing needs for different platforms and devs often write this trivial code themselves. It's just an annoyance no one should deal with so we want to offer our version for free as a service so people can concentrate on code that matters. one option is to open-source the code, but we want to make it even simpler and offer a service so people don't need to deal with updates or installing the code in the first place. the challenge is not getting hammered by bandwidth charges. – Crashalot Nov 08 '19 at 22:04
  • @Crashalot No. If you mean things like mobile app icons, you can [resize](https://stackoverflow.com/questions/19262141/resize-image-with-javascript-canvas-smoothly), [export png](https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/toDataURL), and [zip them](https://stuk.github.io/jszip/) in the browser, no need to go to server. What _exactly_ is the problem you face? Automated upload or offline processing? If you don't know what your problem is, open sourcing it may be the better option. – Sheepy Nov 09 '19 at 00:58
  • The problem is offering common image operations (e.g., resizing for iOS 1x 2x 3x and Android ldpi, hpdi, etc., transforming from SVG to PNG, etc.) as a service so devs don't need to roll their own. Even offering the code as open source requires devs to download the code, install it, and patch any changes. We just want to make this available as a service, but since it's free, need also to avoid/minimize bandwidth and server charges somehow. Thanks for your help! – Crashalot Nov 09 '19 at 06:02
2

One way to build around your web app would be:

1) redirect console.log to standard out (see here: In Chrome, how can I get the javascript console output to stdout/stderr ), probably with the appropriate --log-level flag and error messages redirected somewhere else, so some random messages don't break the whole thing,

2) from the script level, instead / besides saving the result file, console.log it in Base64,

3) and from the CLI side, use a pipe (pipes) that makes Base64 a proper file (and any additional processing).

mbojko
  • 13,503
  • 1
  • 16
  • 26
  • this is fascinating. have you tried this before? no gotchas or potential concerns with this approach? – Crashalot Nov 07 '19 at 21:47
  • 1
    No, it's just me tossing ideas. – mbojko Nov 07 '19 at 21:55
  • i see. it's a clever idea. :) do you see a way to upload the image to the webpage while bypassing the server? POSTing the image in a curl command works, but it causes us to incur bandwidth charges. – Crashalot Nov 07 '19 at 23:57
1

All this is possible with PowerShell. Using Powershell, you can open a browser (IE would be much easier with Powershell as it is naturally supported). You can open a webpage, fill out a form, download or upload data, get object, inspect, etc.

Visit below webpage for more details:

Hope this helps.

S.S.Prabhu
  • 99
  • 10