12

Currently, I have a few (unpublished) Python packages in local use, which I install (for development purposes) with a Bash script on Linux into an activated (otherwise "empty") virtual environment in the following manner:

cd /root/of/python/package
pip install -r requirements_python.txt # includes "nodeenv"
nodeenv -p # pulls node.js and integrates it into my virtual environment
npm i -g npm # update npm ...
cat requirements_node.txt | xargs npm install -g
pip install -e .

The background is that I have a number of node.js dependencies, JavaScript CLI scripts, which are called by my Python code.

Pros of current approach:

  • dead simple: relies on nodeenv for all required plumbing
  • can theoretically be implemented within setup.py with subprocess.Popen etc

Cons of current approach:

  • Unix-like platforms with Bash only
  • "hard" to distribute my packages, say on PyPI
  • requires a virtual environment
  • has potentially "interesting" side effects if a package is installed globally
  • potentially interferes with a pre-existing configuration / "deployment" of nodeenv in the current virtual environment

What is the canonical (if there is any) or just a sane, potentially cross-platform approach of defining node.js dependencies for a Python package, making it publishable?

Why is this question even relevant? JavaScript is not just for web development (any more). There are also interesting (relevant) data processing tools out there. If you do not want to miss / ignore them, well, welcome to this particular form of hell.


I recently came across calmjs, which appears to be what I am looking for. I have not experimented much with it yet and it also appears to be a relatively young project.

I started an issue there asking a similar question.


EDIT (1): Interesting resource: JavaScript versus Research Computing - A Brief Guide for Those Who Regret That This Has Become Necessary


EDIT (2): I started an issue against nodeenv, asking how I could make a project depend on it.

s-m-e
  • 3,433
  • 2
  • 34
  • 71
  • 2
    Oh god, why would you do this to yourself? – Luke Mlsna May 07 '18 at 22:35
  • 2
    @LukeMlsna sometimes we don't have a choice, when building things in the name (or with the name) of interoperability. – metatoaster May 07 '18 at 23:25
  • 1
    @s-m-e what Node algorithm or functionality are you using that doesn't exist in Python? Also, have you considered Docker? You could just ship a container with both JS + Python dependencies installed... – duhaime May 08 '18 at 02:42
  • 1
    @duhaime Docker is certainly an option, but it is not exactly light weight or easy to deploy assuming that your users don't have the option of running Docker containers. – s-m-e May 08 '18 at 06:40
  • 1
    @LukeMlsna I am asking myself the exact same thing ;) – s-m-e May 08 '18 at 06:41
  • @s-m-e what Node package do you need? I have a hunch there is or could be a Python analogue – duhaime May 08 '18 at 10:04

2 Answers2

9

(Disclaimer: I am the author of calmjs)

After mulling over this particular issue for another few days, this question actually encapsulates multiple problems which may or may not be orthogonal to each other depending on one's given point of view, given some of the following (the list is not exhaustive)

  1. How can a developer ensure that they have all the information required to install the package when given one.
  2. How does a project ensure that the ground they are standing on is solid (i.e. has all the dependencies required).
  3. How easy is it for the user to install the given project.
  4. How easy is it to reproduce a given build.

For a single language, single platform project, the first question posed is trivially answered - just use whatever package management solution implemented for that language (i.e. Python - PyPI, Node.js - npm). The other questions generally fall into place.

For a multi-language, multi-platform, this is where it completely falls apart. Long story short, this is why projects generally have multiple sets of instructions for whatever version of Windows, Mac or Linux (of various mainstream distros) for the installation of their software, especially in binary form, to address the third question so that it's easy for the end user (which usually end up being doable, but not necessarily easy).

For developers and system integrators, who are definitely more interested in questions 2 and 4, they likely want an automation script for whatever platform they are on. This is kind of what you already got, except it only works on Linux, or wherever Bash is available. Now this also begs the question: How does one ensure Bash is available on the system? Some system administrators may prefer some other form of shell, so we are again back to the same problem, but instead of asking if Node.js is there, we have to ask if Bash is there. So this problem is basically unsolvable unless a line is drawn.

The first question hasn't really been mentioned yet, and I am going to make this fun by asking it in this manner: given a package from npm that requires a Python package, how does one specify a dependency on PyPI? Turns out such a project exists: nopy. I have not use it before, but at a casual glance it provide a specific way to record dependency information in the package.json file, which is the standard method for Node.js packages convey information about itself. Do note that it has a non-standard way of managing Python packages, however given that it does use whatever Python available, it will probably do the right thing if a Python virtual environment was activated. Doing it this way also mean that Node.js package dependants may have a way to figure out the required Python dependencies that have been declared by their Node.js dependencies, but note that without something else on top of it (or some other ground/line), there is no way to assert from within the environment that it will guarantee to do what needs to be done.

Naturally, coming back to Python, this question has been asked before (but not necessarily in a useful way specifically to you as the contexts are all different):

Anyway, calmjs only solves problem 1 - i.e. let developers have the ability to figure out the Node.js packages they need from a given Python package, and to a lesser extent assist with problem 4, but without the guarantees of 2 and 3 it is not exactly solved.

From within Python dependency management point of view, there is no way to guarantee that the required external tools are available until their usage are attempted (it will either work or not work, and likewise from Node.js as explained earlier, and thank you for your question on the issue tracker, by the way). If this particular guarantee is required, many system integrators would make use of their favorite operating system level package manager (i.e. dpkg/apt, rpm/yum, or whatever else on Linux, Homebrew on OS X, perhaps Chocolatey on Windows), but again this does require further dependencies to install. Hence if multiple platforms are to be supported, there is no general solutions unless one were to reduce the scope, or have some kind of standard continuous integration that would generate working installation images that one would then deploy onto whatever virtualisation services the organisation uses (just an example).

Without all the specific baselines, this question is very difficult to provide a satisfactory answer for all parties involved.

metatoaster
  • 17,419
  • 5
  • 55
  • 66
  • 1
    Thanks a lot for this very exhaustive answer :) It took me a while to think about it - you're right, it's actually multiple separate questions packed into one. I suppose I need to break this down into pieces. – s-m-e Apr 28 '18 at 09:57
  • 1
    Well, breaking it down also has the unfortunately effect of not being able to look at the big picture, because they are all interconnected in sometimes stupid ways. It is a perfectly fine question, imo. – metatoaster Apr 28 '18 at 12:13
  • @metatoaster thank you for your valuable answer it also helpful for me... – heart hacker May 08 '18 at 06:20
5

What you describe is certainly not the simplest problem. For Python alone, companies came up with all kinds of packaging methods (e.g. Twitter's pex, Spotify's dh-virtualenv, or even grocker, which shifts Python deployments into container space) - (plug: I did a presentation at PyCon Balkan '18 on Packaging Python applications).

That said, one very hacky way, I could think of would be:

  • Find a way to compile your Node apps into a single binary. There is pkg (a blogpost about it), which

[...] enables you to package your Node.js project into an executable that can be run even on devices without Node.js installed.

This way the Node tools would be take care of.

  • Next, take these binary blobs and add them (somehow) as scripts to your python package, so that they get distributed along with your package and find their place, where your actual python package can pick them up and execute them.

Upsides:

  • User do not need any nodejs on their machine (which is probably expected, when you just want to pip install something).
  • Your package gets more self-contained by including binaries.

Downsides:

  • Your python package will include binary, which is less common.
  • Containing binaries means that you will have to prepare versions for all platforms. Not impossible, but more work.
  • You will have to expand your package creation pipeline (Makefile, setup.py, or other) a bit to make this simple and repeatable.
  • Your package gets significantly larger (which is probably the least of the problems today).
miku
  • 181,842
  • 47
  • 306
  • 310
  • 2
    `nodeenv -p` does in fact take care of pulling a pre-build node binary into my virtual environment. This might be unexpected / unwanted for some users, so I guess before running it, I should check for a preexisting installation of node or just leave this step to the "user" altogether. – s-m-e Apr 23 '18 at 07:03