What is the most efficient way of writing a JSON file with Bash?

Question

So I have to write a JSON file with bash script, and I know I can do something like echo 'something' >> $file to slowly build a file, but echo redirection instead of real file output seems kind of "hacky." If that is the best way, and not a hacky way at all, I am happy to use echo, but I was just wondering if there is a better way to output a file from bash script.

unfortunately, `echo` is the *"best"* way to do it... and unfortunately, if you take *hacky* away from `bash`, then we have no reason to use it any more... — Jason Hu, Jul 06 '15 at 20:23
Define "most efficient"? Less processor intensive? Less disk intensive? Less code? Easier to read code? — Mr. Llama, Jul 06 '15 at 20:28
@CharlesDuffy well, using `echo` doesn't mean we have to do it one line by one line. we can concat the whole string first then use `echo` once. this may probably faster way than any other you can think about. — Jason Hu, Jul 06 '15 at 20:28
@HuStmpHrrr, yes, but once the re-opening penalty from doing a separate redirection on each line is eliminated, calling a builtin once per line is going to be faster than calling an external process just once unless you're talking about a very large number of lines. — Charles Duffy, Jul 06 '15 at 20:31
@CharlesDuffy - Do you have any numbers to back up that assertion? — Mr. Llama, Jul 06 '15 at 20:34
@CharlesDuffy i don't call any external processes. concat string in a batch and `echo` it at once won't require any external processes. — Jason Hu, Jul 06 '15 at 20:36
@Mr.Llama, I'd be glad to generate some. What would you want? A comparison of number of lines per second which can be written with `echo "foo" >&3` with a pre-opened FD, per number of lines per second which can be written with `echo "foo" >>file`, maybe? — Charles Duffy, Jul 06 '15 at 20:36
@HuStmpHrrr, indeed -- I don't object to your internal concatenation approach; I object to the `cat` approach that`s otherwise being widely suggested. — Charles Duffy, Jul 06 '15 at 20:37
BTW -- **don't ever generate JSON this way**. `jq` is the right tool for the job -- though that's about correctness, not efficiency. If you asked "what's the **best** way to write a JSON file in bash", that would be a completely different question. — Charles Duffy, Jul 06 '15 at 20:38
@CharlesDuffy - I'm mostly curious as to what point a repeated number of `echo "foo" >> file` takes the same amount of time as a `cat` with a heredoc for the same number of lines. As far as raw time, I understand that using a file descriptor and a block will be faster than either of the two alternatives. — Mr. Llama, Jul 06 '15 at 20:38
@CharlesDuffy I am in a situation where I cannot rely on external dependencies (it will be for a company, not for myself). I am looking at the document, but I can't seem to find how portable it is? — THIS USER NEEDS HELP, Jul 06 '15 at 21:28
@user3831137, what do you mean by "portable"? If that's a question of dependencies, `jq` has none other than libc, so it's possible to build pretty much anywhere (and trivial to generate a static binary that will work across distros within an OS and CPU architecture). — Charles Duffy, Jul 06 '15 at 21:30
@user3831137, ...if you need something more portable while retaining correctness, I'd use Python rather than bash -- all modern Python interpreters ship with a JSON parser and generator. — Charles Duffy, Jul 06 '15 at 21:31
@CharlesDuffy So apparently the server this script this is going to be run on has Python 2.6, and my boss told me to stick to bash if possible. I guess my question is can I setup jq in my local machine and just run it on the server? — THIS USER NEEDS HELP, Jul 06 '15 at 21:50
So -- where is the input from this script coming from? How well do you control it, and how much do you trust it to be normalized? Who are the consumers of this JSON? If it includes anything that could be controlled by an outside user, it's worth pushing back -- sticking to pure bash isn't worth a security breach. (I lean towards the paranoid end, but then, I've seen TB of backups deleted by a script that had a bug triggered by a filename that "couldn't happen" and some missing quotes). — Charles Duffy, Jul 06 '15 at 21:58
"setup jq in my local machine and just run it on the server" isn't really clear in terms of what you mean. If you mean compile it on your local machine and copy the compiled binary over to the server, then yes -- if they're not the same architecture you'll need to cross-compile, which is more than I can describe how to do here, but it's definitely possible. OTOH, that kind of practice isn't a good idea -- makes for hard-to-maintain systems when software depends on binaries people hand-built. And Python 2.6 *does* have a JSON module built in: https://docs.python.org/2.6/library/json.html — Charles Duffy, Jul 06 '15 at 22:00
@HuStmpHrrr, ...back to echo as "best", see the POSIX echo spec's notes re: deprecation in favor of printf. :) — Charles Duffy, Jul 07 '15 at 01:39
Possible duplicate of [Output JSON from Bash script](https://stackoverflow.com/q/12524437/608639) — jww, Jun 22 '19 at 18:20

score 15 · Accepted Answer · edited Jun 20 '20 at 09:12

Efficiently generating output

echo is a built-in, not an external command, so it's not nearly as inefficient as you think. What is inefficient is putting >> filename on the end of each echo.

This is bad:

# EVIL!
echo "something" >file
echo "first line" >>file
echo "second line" >>file

This is good:

# NOT EVIL!
{
  echo "something" >&3
  printf '%s\n' "first line" "$second line" >&3
  # ... etc ...
} 3>file

...opens the output file only once, eliminating the major inefficiency.

To be clear: Calling echo, say, 20 times is considerably more efficient than calling cat once, since cat is an external process, not part of the shell. What's highly inefficient about running echo "foo" >>file 20 times is opening and closing the output file 20 times; it's not echo itself.

Correctly generating JSON

Don't use cat, echo, printf, or anything else of the sort. Instead, use a tool that understands JSON -- any other approach will lead to potentially incorrect (perhaps even exploitable via injection attacks) results.

For instance:

jq \
  --arg something "$some_value_here" \
  --arg another "$another_value" \
  '.["something"]=$something | .["another_value"]=$another' \
  <template.json >output.json

...will generate a JSON file, based on template.json, with something set to the value in the shell variable "$some_value_here" and another_value set to, well, a second value. Unlike naive approaches, this will correctly handle variable values which contain literal quotes or other characters which need to be escaped to be correctly represented.

An aside on echo

All the above having been said -- echo should be avoided in favor of printf (with an appropriate, static format string). Per the POSIX sh standard:

APPLICATION USAGE

It is not possible to use echo portably across all POSIX systems unless both -n (as the first argument) and escape sequences are omitted.

The printf utility can be used portably to emulate any of the traditional behaviors of the echo utility as follows (assuming that IFS has its standard value or is unset):

[...]

New applications are encouraged to use printf instead of echo.

RATIONALE

The echo utility has not been made obsolescent because of its extremely widespread use in historical applications. Conforming applications that wish to do prompting without s or that could possibly be expecting to echo a -n, should use the printf utility derived from the Ninth Edition system.

As specified, echo writes its arguments in the simplest of ways. The two different historical versions of echo vary in fatally incompatible ways.

The BSD echo checks the first argument for the string -n which causes it to suppress the that would otherwise follow the final argument in the output.

The System V echo does not support any options, but allows escape sequences within its operands, as described for XSI implementations in the OPERANDS section.

The echo utility does not support Utility Syntax Guideline 10 because historical applications depend on echo to echo all of its arguments, except for the -n option in the BSD version.

Heck if you're doing to just use `echo`, you might as well use multiline strings. ;) — Mr. Llama, Jul 06 '15 at 20:31
@Charles Duffy: Are you sure the efficiency meant by the OP is the execution time and not, say, code readability? — Eugeniu Rosca, Jul 06 '15 at 20:31
@EugeniuRosca, when has "efficiency" ever meant "readability"? They're completely different (and sometimes conflicting) goals. If the OP meant "terseness", they should have said that. :) — Charles Duffy, Jul 06 '15 at 20:32
just for laughing: this reminds me that my boss wrote a whole batch of code to try to use pure bash to analyze json data(and it took lots of time). — Jason Hu, Jul 06 '15 at 20:41
@HuStmpHrrr, I've done some surprisingly fancy analysis (including joins between separate input sources) in `jq` -- it's a surprisingly powerful language. — Charles Duffy, Jul 06 '15 at 20:44
@CharlesDuffy ah yeah. i agreed that `jq` is nice. but i forgot to tell you the punch point: he mainly used `grep` and `awk` and he doesn't know `jq`. — Jason Hu, Jul 06 '15 at 20:47
@JasonHu Just a small nitpick: awk is not pure bash... pure base would be builtins only, as any external app can be used from any shell, or even from other env like python... — Dani_l, Feb 16 '20 at 07:00

score 8 · Answer 2 · edited May 23 '17 at 12:17

8

You can use cat and here-document format:

cat <<'EOF' > output.json
{
    "key": "value",
    "num": 5,
    "tags": ["good", "bad"],
    "money": "$0"
}
EOF

Note the single ticks around the here-document anchor. This prevents interpolation of the document's contents. Without it, the $0 can be substituted.

If you define efficiency as raw speed as opposed to readability, you should consider using Charles Duffy's answer instead as it's almost an order of magnitude faster for small number of lines (echo 0.01s vs cat 0.1s).
If you need to create files larger than a few hundred lines, you should consider a method other than cat/echo.

edited May 23 '17 at 12:17

Community

1
1

answered Jul 06 '15 at 20:25

Mr. Llama

20,202
2
62
115

I'd hardly call this more efficient -- calling `cat` involves a fork()/exec() cycle. – Charles Duffy Jul 06 '15 at 20:27
Ahh -- I see you've already done the benchmarking. Much appreciated, and +1. :) – Charles Duffy Jul 06 '15 at 20:46

mob · Answer 3 · 2015-07-06T20:49:24.187

4

Construct the data in an environment variable, and echo it once.

var=something
var="$var something else"
var="$var and another thing"
echo "$var" > file

edited Jul 06 '15 at 20:49

answered Jul 06 '15 at 20:46

mob

117,087
18
149
283

Needs quotes -- `echo "$var"` -- to avoid string-splitting and glob expansion of the value; especially important if that value includes newlines, which `echo $var` would throw away. – Charles Duffy Jul 06 '15 at 20:47
yep. that's what i called efficient. – Jason Hu Jul 06 '15 at 20:47
1

I'd also suggest using the bash-native append syntax: `var+="something else"$'\n'` (demonstrating also the bash syntax for a newline). – Charles Duffy Jul 06 '15 at 20:48
...for a long enough buffer, btw, it might be more efficient to collect the strings in an array, and use `printf` to emit them all at once -- with a delimiter, if one so chooses. ie. `arr=( ); arr+=( "first line" "second line" ); arr+=( "third line" ); printf '%s\n' "${arr[@]}"` (or without the `\n` if one wants one big line) -- that way one avoids inefficiencies in how bash does array concatenation. – Charles Duffy Jul 06 '15 at 20:50
...also, see the POSIX spec talking about why `echo` is deprecated in the RATIONALE and APPLICATION notes sections of http://pubs.opengroup.org/onlinepubs/009604599/utilities/echo.html – Charles Duffy Jul 06 '15 at 20:50

score 1 · Answer 4 · answered Jul 06 '15 at 20:25

1

Besides echo, you could use cat:

cat > myfile << EOF
Hello
World
!
EOF

answered Jul 06 '15 at 20:25

Eugeniu Rosca

5,177
16
45

As I commented elsewhere, forcing a call to an external tool such as `/usr/bin/cat` is hardly an efficiency optimization. – Charles Duffy Jul 06 '15 at 20:27

score 0 · Answer 5 · answered Jul 06 '15 at 20:28

You can use cat and "heredocs" to minimize the number of calls you have to make.

$ cat foo.sh
cat <<'HERE' > output
This
that
the other
    indentation is 
        preserved

as are 

    blank lines

The end.
HERE
$ sh foo.sh
$ cat output
This
that
the other
    indentation is 
        preserved

as are 

    blank lines

The end.

What is the most efficient way of writing a JSON file with Bash?

5 Answers5

Efficiently generating output

Correctly generating JSON

An aside on echo

APPLICATION USAGE

RATIONALE