2

I'm using jq to try and merge 2 json files into one unique file.

The result is close to what I was looking for, but not just right.

File 1:

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "actors": [
        {
          "names": [
            "Emma Watson",
            "Other actor"
          ],
          "other": "Some value"
        }
      ]
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "actors": [
        {
          "names": [
            "Emma Watson"
          ],
          "other": "Some value"
        }
      ]
    }
  ]
}

File 2:

{
  "series": "Harry Potter Movie Series",
  "producer": "David Heyman",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "year": "2001"
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "year": "2002"
    }
  ]
}

Expected result:

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "year": "2001",
      "actors": [
        {
          "names": [
            "Emma Watson",
            "Other actor"
          ],
          "other": "Some value"
        }
      ]
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "year": "2001",
      "actors": [
        {
          "names": [
            "Emma Watson"
          ],
          "other": "Some value"
        }
      ]
    }
  ],
  "producer": "David Heyman"
}

Best result I've got so far (only arrays with actors are missing):

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "year": "2001"
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "year": "2002"
    }
  ],
  "producer": "David Heyman"
}

Using one of the commands below:

jq -s '.[0] * .[1]' file1 file2

jq --slurp 'add' file1 file2

jq '. * input' file1 file2


If I switch order of files I either end up losing 'actors' from file1 or 'year' from file2.

How it should work:

  • the elements in file 2 will be leading and should replace the matching elements in file 1.
  • the elements in file 1 that doesn't exist in file 2 (like writer and movies[].actors elements) shouldn't be deleted
  • the elements in file 2 that doesn't exist yet in file 1 will be added (like producer and movies[].year).
  • a title is unique and should by default not occur more then once, but if it does remove the duplicates.

I would assume there is a solution to get these movies arrays perfectly merged with jq.

freljord
  • 23
  • 3
  • All of the three commands produce your expected result. Maybe you flipped the order of the files (order matters). If it's the order of the fields within the object bothering you, try adding ` | {series, writer, movies, producer}` to whichever command you prefer (although comparisonwise there is no such thing as an order of fields in an object). You also may want to have a look at [this](https://stackoverflow.com/questions/19529688/how-to-merge-2-json-objects-from-2-files-using-jq) question. – pmf Jun 09 '22 at 12:50
  • 1
    Note that the first and the last of the three commands use `*` (not `+`) for a deep merge, while the middle one uses `add` which iterates through the array using `+`, thus it is just a top-level merge. Iteration through the slurped files using `*` would be `jq --slurp 'reduce .[] as $i ({}; . * $i)' file1 file2` (only useful for more than two or a variable number of files, otherwise `.[0] * .[1]` is just as good). – pmf Jun 09 '22 at 13:27
  • Thanks for reaching out - I added 'year' to file2 to point out the problem more specifically. If I switch order of files I either lose 'actors' from file1 or 'year' from file2. – freljord Jun 09 '22 at 14:18
  • You surely do because the latter overwrites the former. If you want arrays (not objects) to be merged, describe the mechanism you envision for such an operation. Should the elements be added up (giving you title twice), should duplicates be removed (what if one file alone already contains duplicates), ‌...? – pmf Jun 09 '22 at 14:40
  • The values in file 2 will be leading (except for the writer and movies[].actors elements). All the movie elements in file 2 should replace the matching elements in file 1. If 'year' doesn't exist yet in file 1, it should be added. Titles should be unique from itself and may not occur more then once, but if it does duplicaties should be removed. – freljord Jun 09 '22 at 14:57
  • So you're asking how to merge arrays, but you did not say how you want to merge arrays. Will `.movies[0]` always correspond to the same movie in both files? Will `.actors[0]` (for a given movie) always correspond to the same actor? What about `.names` for actors? – ikegami Jun 09 '22 at 15:24
  • That's right - `.movies[0]` always correspond to the same movie in both files. (title is unique and can be used eventually to match) Furthermore `.actors[0]` and `.names` will never appear in file 2, but should correspond to the right movie `.title`). – freljord Jun 09 '22 at 16:03

1 Answers1

1

You are looking for a solution that "merges" objects and arrays. For the former you have already found + (or add) for a top-level merge, and * for a recursive merge, but merging arrays (namely the two .movies fields) needs more specification from your end as there is no canonical solution for that.

In a comment you state

.movies[0] always correspond to the same movie in both files

This enables you to use transpose to align the items from both arrays, and then apply object-merging on each pair of corresponding items. If you want to merge deeper arrays as well (e.g. .movies[].actors or .movies[].actors[].names) you need to extend this approach accordingly. Here's a solution using plain add for the merging of the array items as well as of the other top-level fields:

jq -s 'add + {movies: map(.movies) | transpose | map(add)}' file1 file2
{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "actors": [
        {
          "names": [
            "Emma Watson",
            "Other actor"
          ],
          "other": "Some value"
        }
      ],
      "year": "2001"
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "actors": [
        {
          "names": [
            "Emma Watson"
          ],
          "other": "Some value"
        }
      ],
      "year": "2002"
    }
  ],
  "producer": "David Heyman"
}

Demo

pmf
  • 24,478
  • 2
  • 22
  • 31