Unix tool to reduce file chunks to a string (--delim=, --pre=[ --suf=] < file)

Question

I am looking for a program, which concatenates strings with a char, prepends one char and appends another. I think I have used the wrong keywords for my search, but I was not able to find the perfect unix tool for that issue.

Suppose I have a file (note the starting empty lines):

file in.txt:



{
"some": "json",
"with_different": "intendation",
  "which": [],
  "has": 2
}

{
 "json":"objects"
}

and generate out.txt

[
{
"some": "json",
"with_different": "intendation",
  "which": [],
  "has": 2
}
,
{
 "json":"objects"
}
]

Basically, I want a JSON-array from that, meaning:

get rid of first empty lines ( uniq | tail --lines=+2),
replace empty lines with comma (sed -e 's/^$/,g/') and
prepend/append it with [ and ] (awk 'BEGIN {print "["} {print $1} END {print "]"}).

uniq <in.txt | tail --lines=+2 | sed -e 's/^$/,/g' | awk 'BEGIN {print "["} {print $1} END {print "]"}' is giving me what I want, but I sure think, that this is not elegant.

I have found paste, xargs, join, but they do not help me. Also I know about the OFS variable in awk, which may replace the sed part, but I don't know how to convince awk to treat all 'non-empty' lines as $1 (probably using IFS, but IFS='^$' is surely not working.) And then we still have the other boilerplate around it.

I am hoping that someone can point me to magic-program like magic -d"," -s"[" -e"]" <in, provided I have cleaned the empty lines above, or the objects are one-liners

file in:

{"some":"json",  "which":[],  "has": 2}

{ "json":"objects"}

to file out:

[
{"some":"json",  "which":[],  "has": 2}
,
{ "json":"objects"}
]

Other example would be echo "a b c" | magic -d',' -s'[' -e']' returns [a,b,c].

Or, to not only give JSON examples: echo "my new component" | magic -d'-' -s'<' -e'>' returns <my-new-component>.

Notes:

jq -s . would work for this json-problem (cf. How to combine the sequence of objects in jq into one object?) but if the start/end/delim chars are something else it wouldn't.
I am fine with line breaks being removed.
I would really like to have a shorter one-liner than my own attempt

works for this scenario, as its `json`. I still wonder, if there is `unix` tools being used more cleverly to solve this (also more general) problem, specifying delim, pre/suffix — Joel, Aug 12 '20 at 07:46
[Here](https://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html) is a list of standard utilities, see if there is one. — oguz ismail, Aug 12 '20 at 07:50
You should watch out for the `uniq` at the start of your current solution. You might (a) have adjacent "non-empty" lines that you want to preserve (e.g. opening or closing curly brackets if their identation matches, and (b) have adjacent "whitespace only" lines that you want to `uniq` but are not affected as they have different whitespace. — borrible, Aug 12 '20 at 08:56
Are you looking for a tool to convert what you show under `in.txt` into the text under `in` or something else? If so, then naming your expected output `out` would be clearer than naming it `in`, if not then idk what it is you're looking for. — Ed Morton, Aug 12 '20 at 18:57
@EdMorton added out-examples, hopefully that makes it clearer. @borrible Good point and true for the general case. In this case, as those in-files are generated, I is granted, that `uniq — Joel, Aug 12 '20 at 21:43

score 1 · Answer 1 · answered Aug 12 '20 at 22:49

1

$ awk -v RS= 'BEGIN{sep="[\n"} {printf "%s%s", sep, $0; sep="\n,\n"} END{print "\n]"}' in.txt
[
{
"some": "json",
"with_different": "intendation",
  "which": [],
  "has": 2
}
,
{
 "json":"objects"
}
]

.

$ awk -v RS= 'BEGIN{sep="[\n"} {printf "%s%s", sep, $0; sep="\n,\n"} END{print "\n]"}' in
[
{"some":"json",  "which":[],  "has": 2}
,
{ "json":"objects"}
]

answered Aug 12 '20 at 22:49

Ed Morton

188,023
17
78
185

Thanks, Using `RS` is indeed better then my approach. However, I seek a shorter command then that long `awk`. If there is no other suggestions in a couple of days, I'll accept this. – Joel Aug 13 '20 at 02:04
Why do you care how long the command is? Aren't clarity and simplicity more important than brevity? If you want to make it shorter then you could rename the variable `sep` to `s` and save yourself 6 characters and init it on the command line instead of in a BEGIN section to save a few more and get rid of white space to save yet more, e.g. `awk -v RS= -v s='[\n' '{printf "%s%s",s,$0;s="\n,\n"}END{print "\n]"}'` but IMHO that's all just pointlessly obfuscating the code. – Ed Morton Aug 13 '20 at 13:40
Maybe I should have written “simpler” instead of “short”, that’s actually what I’m after. Since I have that problem rather often, I thought that there is a _dedicated_ program for that, which I hoped one could point me to. Sure, I could put that program as a function or alias in my .zshrc, but that would neither tell me whether I could also use a very simple other cmd. That’s why I asked in the first place. And btw, I also don’t get why there is downvotes (esp. without constructive comments)... – Joel Aug 13 '20 at 20:05
1

OK - no, there is no dedicated program for that as it's not a common problem and your input file isn't in a format defined by any standard. idk why you're getting downvotes, I didn't downvote, but constructive comments are fairly frequently met with negative, personal attacks in response (just happened to me in a different thread) so it's not unreasonable to just downvote without leaving a comment. – Ed Morton Aug 13 '20 at 20:13

M. Nejat Aydin · Answer 2 · 2020-08-14T02:36:41.007

1

Using GNU sed:

sed '/./,$!d; s/^/[\n/; :a; n; s/^$/,/; $s/$/\n]/; ba' in.txt

with the assumption that there is no trailing blank line in the input (leading ones are discarded).

Alternatively:

sed -n '/./{s/^/[\n/; :a; p; n; s/^$/,/; $s/$/\n]/; ba}' in.txt

edited Aug 14 '20 at 02:36

answered Aug 14 '20 at 02:13

M. Nejat Aydin

9,597
1
7
17

How does the first command match leading blank lines? Maybe you can also add more info on the intermediate commands. Looking at `man sed` these are my assumptions: `s/^/[\n/;` matches start and replaces it w/ `[`, `:a;` label to jump to after each line, `n;`: appends current line, `s/^$/,/;` repl. empty lines w/ `,`, `$s/$/\n]/;` replaces final line w/ `]`, `ba` brances back to lbl `a`. – Joel Aug 14 '20 at 02:31
1

@Joel `/./` matches a non-blank line. `/./,$` matches lines between the first non-blank line and the last line, inclusively. The `!` negates matching line range, thus remaining lines are leading blank lines, which are deleted by the `d` command. – M. Nejat Aydin Aug 14 '20 at 02:40
@Joel `n` prints the pattern space if the `-n` flag isn't specified as a command line argument, and reads the next line. If there is no next line, the `sed` quits. – M. Nejat Aydin Aug 14 '20 at 02:44
1

@Joel You may want to read [this document](https://www.gnu.org/software/sed/manual/sed.html) for detailed information about GNU sed. – M. Nejat Aydin Aug 14 '20 at 02:52

Unix tool to reduce file chunks to a string (--delim=, --pre=[ --suf=] < file)

2 Answers2