0

I'm building a parsing engine, which will take JSON strings as inputs, parse the JSON string, and output the parsed JSON string. I'd like the parsing engine to run as either a daemon or service, so I can deploy it using Docker. It needs to be extermely high performance, because it will parse high volumes of data.

I understand I could just have a script, which launches sed as a background process. But, it seems like launching and re-launching a process will incur overhead, thus reducing performance. I'm thinking running sed as a daemon or service might allow me the convenience of using an existing, and well vetted tool while maximizing system performance.

Additionally, if awk or another existing tool would be better suited to this purpose, I am open to other options. But, I'd like it to be a well vetted Linux/Unix tool if possible, just to avoid re-inventing the wheel.

I read this SO question. And this one regarding running emacs as a daemon. But, neither seem to work for sed.

I have also considered piping stdin to sed in a daemon, but not sure if that is the best approach.

UPDATE The key thing I am trying to ask is this: How can I run either sed, awk, or jq as a daemon, so that I can pass many strings to it without incurring the overhead of launching a new process?

MikeyE
  • 1,756
  • 1
  • 18
  • 37
  • 2
    Not sure what a "parsed JSON string" looks like, but [jq](https://stedolan.github.io/jq/) is probably a better choice than awk or sed. – Benjamin W. Apr 05 '20 at 22:40
  • @BenjaminW. I've actually been reading the jq docs since I posted my question, so your comment was very timely! – MikeyE Apr 05 '20 at 23:21
  • Why the votes to close my question? Even if jq is the "right" tool, how do I run it as a service or daemon? – MikeyE Apr 05 '20 at 23:23
  • 1
    I didn't vote to close; the close vote says "needs more focus". – Benjamin W. Apr 05 '20 at 23:50
  • 1
    Does using named pipes make sense ? somewhat like what is described here: https://hackaday.com/2019/07/12/linux-fu-named-pipe-dreams/ – ranga Apr 06 '20 at 06:20
  • 1
    If your question is just about run as daemon, you may find answers under [Run bash script as daemon](https://stackoverflow.com/questions/19233529/) or [“Proper” way to run shell script as a daemon](https://unix.stackexchange.com/questions/426862/). – U880D Apr 06 '20 at 08:02
  • 1
    I could combine the solutions offered by @ranga and @U880D and run a script, and thus a linux process, as a daemon redirecting its input/output using named pipes. That would definitely accomplish my secondary needs, but my key question is how to run a process itself as a daemon, so it doesn't need to be re-instantiated every time through a loop in a bash script. It might not be possible with using `sed`, `awk`, or `jq`. Just trying to see if it is. They key part of my qeuestion being, "without incurring the overhead of launching a new process". But, all solutions posted are much appreciated. – MikeyE Apr 07 '20 at 01:47

1 Answers1

2

(this was too big for a comment)

The way I understand it, these classic unix text-processing tools such as sed, awketc are written as filters, which process an input stream and produce an output stream. They are not built for being daemons, they terminate after processing the input stream. EOF on the input stream will eventually terminate the filter. So you'll have to keep that pipe open.

If you don't like the idea of wrapping the tool with a shell script, perhaps the functionality needed to keep the pipe open, turn the process into a daemon and later close the open file descriptor to gracefully terminate the process can be implemented in the constructor/destructor (init/fini) of a shared library which can be preloaded (with LD_PRELOAD) while running the tool.

If you choose to implement something like that, the daemonize project can be a good starting point.

ranga
  • 372
  • 1
  • 4