1

There are lots of similar questions but none for dynamically joining 2 files. What I'm trying to do is to dynamically edit the following structure:

{
  "features": [
    {
      "type": "Feature",
      "properties": {
        "name": "0",
        "height": 0.7
      }
    },
    {
      "type": "Feature",
      "properties": {
        "name": "1",
        "height": 0
      }
    }
  ]
}

I want to replace only the one field .features[].properties.name with a random value from a 1d-array inside another txt file. There are 8,000 features and around 100 names I've prepared.

This is what I've got now failing with errors:

#!/bin/bash
declare -a names=("name1" "name2" "name3")
jq '{
    "features" : [
        "type" : "Feature",
        "properties" : {
            "name" : `$names[seq 0 100]`,
            "height" : .features[].properties.height
        },
        .features[].geometry
    ]
}' < areas.json

Is it even possible to do in a single command or I should use python or js for such tasks?

Igniter
  • 857
  • 9
  • 24
  • Is there a reason your `names.txt` is in the format it presently is? I personally would shuffle it out-of-band. – Charles Duffy Jan 17 '19 at 18:10
  • It would also make this much clearer if you showed an example of the intended output. It's not clear to me if the *only* thing you mean by "joining" is just replacing the one value, or if you want to do something else as well. – Charles Duffy Jan 17 '19 at 18:10
  • Nope, just trying to simplify things, actually I can save names in any format – Igniter Jan 17 '19 at 18:11
  • 2
    BTW, just as background, `names=$( – Charles Duffy Jan 17 '19 at 18:12
  • I want just take a random name from txt and replace current `.name` inside json with it, that's all – Igniter Jan 17 '19 at 18:12
  • Next question: What's the *size* of your files, and do you need this to run in constant memory? – Charles Duffy Jan 17 '19 at 18:13
  • Yep, thanks for that, I can barely remember some patterns and your comments are valuable – Igniter Jan 17 '19 at 18:14
  • BTW, your input isn't currently valid JSON, so folks can't run it through jq to test answers. (And I actually need to go to lunch very soon, so that kind of ease-of-testing has a lot to do with whether I'm going to get to the point of a useful, tested solution). – Charles Duffy Jan 17 '19 at 18:15
  • 8,000 features (1.7 Mb) and ~100 one-wordly names, no more – Igniter Jan 17 '19 at 18:15
  • Hmmm. So you expect names to be reused, then? – Charles Duffy Jan 17 '19 at 18:16
  • BTW, I'm still getting an `unmatched ]` error trying to parse, even as-edited. – Charles Duffy Jan 17 '19 at 18:18
  • Yep, just random names like `Trump tower`, `Confederate spike`, etc. I have also fixed JSON, thanks for tips – Igniter Jan 17 '19 at 18:18
  • It's still broken: https://jqplay.org/s/GDFlciiA2A – Benjamin W. Jan 17 '19 at 18:19
  • Looks legit, could that be due to `coordinates` mess? – Igniter Jan 17 '19 at 18:20
  • 1
    so, if memory isn't a concern here, I'd probably do something like `jq -Rn --slurpfile areas areas.json '...' < <(exec shuf -r words.txt)` to give your JSON the original unmodified file in a jq variable `$areas`, and a stream of words from which you can pull the next item with the jq function `input`. Can't actually write and test the `...` in the time available before I need to run, but hopefully that's a useful starting point. Keep in mind that you don't need to iterate over your default `.` in jq; you can always write `$areas | ...` – Charles Duffy Jan 17 '19 at 18:21
  • https://echarts.baidu.com/examples/data-gl/asset/data/buildings.json – Igniter Jan 17 '19 at 18:22
  • That's the JSON, thanks for that website to play around! – Igniter Jan 17 '19 at 18:23
  • 1
    Looks like you have an extra opening `{` for the second object in `features`. – Benjamin W. Jan 17 '19 at 18:24
  • BTW, for future note, if you want to post a JSON sample with unusual characters on the web and have it behave nicely when copied-and-pasted across platforms &c, you can use `jq -a` to convert it into a plain-ASCII representation. – Charles Duffy Jan 17 '19 at 22:31

3 Answers3

3

Your document (https://echarts.baidu.com/examples/data-gl/asset/data/buildings.json) is actually small enough that we don't need to do any crazy memory-conservation tricks to make it work; the following functions as-is:

# create sample data
[[ -e words.txt ]] || printf '%s\n' 'First Word' 'Second Word' 'Third Word' >words.txt

# actually run the replacements
jq -n --slurpfile buildings buildings.json '
  # define a jq function that changes the current property name with the next input
  def replaceName: (.properties.name |= input);
  # now, for each document in buildings.json, replace each name it contains
  $buildings[] | (.features |= map(replaceName))
' < <(shuf -r words.txt | jq -R .)

This works because shuf -r words.txt creates an unending stream of words randomly chosen from words.txt, and the jq -R . inside the process substitution quotes those as strings. (Because we only call input once per item in buildings.json, we don't try to keep running after that file's contents have been completely consumed).


For the tiny two-record document given in the question, the output looks like:

{
  "features": [
    {
      "type": "Feature",
      "properties": {
        "name": "Third Word",
        "height": 0.7
      }
    },
    {
      "type": "Feature",
      "properties": {
        "name": "Second Word",
        "height": 0
      }
    }
  ]
}

...with the actual words varying each run; it's similarly been smoketested with the full externally-hosted file.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Could you please describe how to save output into the same json using replaceName, since now it just logs everything in the console and stops at multiple `c62;` (don't know what that means)? – Igniter Jan 18 '19 at 00:29
  • @Igniter, redirect to a different file and rename on success. You can't redirect to the same file you're reading from, because opening a file for write deletes everything in it, and files get opened *at the very beginning of the process*, before they've been fully read yet. That's an easy way to have your process fail early, when its input disappears because the input file was truncated because it was opened for write. – Charles Duffy Jan 18 '19 at 00:32
  • @Igniter, ...see also [How can I use a file in a command and redirect output to the same file without truncating it?](https://stackoverflow.com/questions/6696842/how-can-i-use-a-file-in-a-command-and-redirect-output-to-the-same-file-without-t) – Charles Duffy Jan 18 '19 at 00:33
  • Got it, what variable I should use for it? Where does the script store temp content to? Or would you mind throwing a quick template string for that? Or – Igniter Jan 18 '19 at 00:36
  • @Igniter, ...so, you'd typically after the `< <(...)` code have, on the same line, something like `> "buildings.json.$$" && mv "buildings.json.$$" "buildings.json"` (if you're operating in a directory where only you and users you trust have write; if you're operating in `/tmp` or such, you always need to use `mktemp` or similar to protect against attacks -- see https://www.owasp.org/index.php/Insecure_Temporary_File) – Charles Duffy Jan 18 '19 at 00:36
  • 1
    If you're on a Mac, you could use gshuf - see https://apple.stackexchange.com/questions/142860/install-shuf-on-os-x – peak Jan 18 '19 at 00:37
  • @peak, ...heh. I actually tested all this on MacOS, but I have a Nix install of coreutils (aside: I'm *hugely* fond of [Nix](https://nixos.org/nix/); never going back to Homebrew/MacPorts), so I didn't realize that `shuf` wasn't coming from the OS vendor. – Charles Duffy Jan 18 '19 at 00:37
  • How many aces do you guys have in your sleeves, I'm already blown away with the current information inserted :) – Igniter Jan 18 '19 at 00:39
  • Worked like a charm, but it has inflated the file size from 1.7 to 3.1 Mb using only `first-third word` combination. Should it be considered OK or I made something wrong? – Igniter Jan 18 '19 at 00:43
  • This file would be served to my users, so I should care about the total filesizes – Igniter Jan 18 '19 at 00:45
  • @Igniter, I'd have to inspect the actual output. Check for if it, say, changed tabs to spaces, or switched to a more verbose ASCII encoding. You might just add the `-c` flag to `jq` for "compact" output. And of course, compressing content is always a good idea. – Charles Duffy Jan 18 '19 at 00:45
  • @Igniter, ...btw, for content where space is really at a premium, I'd usually avoid JSON in favor of [msgpack](https://msgpack.org/) as an over-the-wire protocol. See the JavaScript implementation at http://kawanet.github.io/msgpack-lite/ – Charles Duffy Jan 18 '19 at 00:47
  • That feeling when each new message you see has more valuable information than all your previous week :) thanks! – Igniter Jan 18 '19 at 01:21
  • Is there a way to quickly compare the initial and edited files to be assured that only the names were changed? According to my chart it's clearly seen that some polygon coordinates have been somehow lost and not shown anymore. Using `-c` flag I have the same file size, but something apparently was broken... :( – Igniter Jan 18 '19 at 02:43
  • I'd suggest diffing the stream-format representation of both files. `jq -c tostream out.json-stream` will give you a line-oriented, diff-friendly representation of a JSON file's contents. Generate that for each, diff them, and you should be able to see straightforwardly what changed. – Charles Duffy Jan 18 '19 at 02:50
  • BTW, have you made sure that `jq -c . out.json` generates output that still parses properly, just so you're testing starting with the base (simplest possible) case? – Charles Duffy Jan 18 '19 at 02:54
2

Here's a solution to the problem of choosing the names randomly with replacement, using the very simple PRNG written in jq copied from https://rosettacode.org/wiki/Random_numbers#jq

Invocation:

jq  --argjson names '["name1","name2","name3","name4"]' \
  -f areas.jq areas.json

areas.jq

# The random numbers are in [0 -- 32767] inclusive.
# Input: an array of length at least 2 interpreted as [count, state, ...]
# Output: [count+1, newstate, r] where r is the next pseudo-random number.
def next_rand_Microsoft:
  .[0] as $count | .[1] as $state
  | ( (214013 * $state) + 2531011) % 2147483648 # mod 2^31
  | [$count+1 , ., (. / 65536 | floor) ] ;

# generate a stream of random integers < $n
def randoms($n):
  def r: next_rand_Microsoft
    | (.[2] % $n), r;
  [0,11] | r ;


. as $in
| ($names|length) as $count
| (.features|length) as $n
| [limit($n; randoms($count))] as $randoms
| reduce range(0; $n) as $i (.;
    .features[$i].properties.name = $names[$randoms[$i]] )
peak
  • 105,803
  • 17
  • 152
  • 177
  • Niiice; I'm impressed. :) – Charles Duffy Jan 17 '19 at 23:31
  • @CharlesDuffy -- Yes, except for the typo. (Fixed.) – peak Jan 17 '19 at 23:36
  • You guys are so brilliant, enjoyed having you in the thread all the way and learnt a lot, thanks – Igniter Jan 17 '19 at 23:50
  • Would you and @CharlesDuffy be so kind to share where you would have started with bash lang if you just had to learn it again from a scratch? This is so powerful you both wrote here during your little intellect race :) – Igniter Jan 17 '19 at 23:57
  • @Igniter, I learned from the irc.freenode.org #bash channel; the documentation they maintain is mostly on the Wooledge wiki -- see the [BashFAQ](http://mywiki.wooledge.org/BashFAQ), [BashGuide](https://mywiki.wooledge.org/BashGuide), and [BashPitfalls](http://mywiki.wooledge.org/BashPitfalls) pages. The [bash-hackers' wiki](http://wiki.bash-hackers.org/) is another really good source. peak can speak better than I can to good jq resources. – Charles Duffy Jan 18 '19 at 00:00
  • @CharlesDuffy thanks for that, I appreciate it, wish you both the best – Igniter Jan 18 '19 at 00:11
1

Assuming your areas.json is valid JSON, then I believe the following would come close to accomplishing your intended edit:

names='["name1","name2","name3","name4"]'
jq --argjson names "$names" '.features[].properties.name = $names
  ' < areas.json

However, given your proposed solution, it's not clear to me what you mean by a "random value from a 1d-array". If you mean that the index should be randomly chosen (as by a PRNG), then I would suggest computing it using your favorite PRNG and passing in that random value as another argument to jq, as illustrated in the following section.

So the question becomes how to transform the text

['name1','name2','name3','name4']

into a valid JSON array. There are numerous ways this can be done, whether using jq or not, but I believe that is best left as a separate question or as an exercise, because the selection of the method will probably depend on specific details which are not mentioned in this Q. Personally, I'd use sed if possible; you might also consider using , as also illustrated in the following section.

Illustration using hjson and awk

hjson -j <<< "['name1','name2','name3','name4']" > names.json.tmp

function randint {
  awk -v n="$(jq length names.json.tmp)" '
    function randint(n) {return int(n * rand())}
    BEGIN {srand(); print randint(n)}'
}

jq --argfile names names.json.tmp --argjson n $(randint) '
  .features[].properties.name = $names[$n]
' < areas.json

Addendum

Currently, jq does not have a builtin PRNG, but if you want to use jq and if you want a value from the "names" array to be chosen at random (with replacement?) for each occurrence of the .name field, then one option would be to pre-compute an array of the randomly selected names (an array of length features | length) using your favorite PRNG, and passing that array into jq:

jq --argjson randomnames "$randomnames" ' 
  reduce range(0; .features[]|length) as $i (.;
    .features[$i].properties.name = $randomnames[$i]) 
  ' < areas.json

Another option would be to use a PRNG written in jq, as illustrated elsewhere on this page.

peak
  • 105,803
  • 17
  • 152
  • 177
  • This is so powerful, thank you for the solution, I might end up with a solid python function or even js placing all the substitution on clients' shoulders :) – Igniter Jan 17 '19 at 23:50
  • 1
    @igniter - The shell `for` loop in your draft solution threw me for a while ... maybe you could modify that bit to make it clear you only want one of the names at a time? – peak Jan 17 '19 at 23:57