3

I have an array of objects and I want to remove some duplicates in this array, I want to keep the count of the duplicates though.

My input is:

[
    {
        "foo": 1,
        "bar": "a",
        "baz": "whatever"
    },
    {
        "foo": 1,
        "bar": "a",
        "baz": "hello"
    },
    {
        "foo": 1,
        "bar": "b",
        "baz": "world"
    }
]

(not sure if it's important but the uniqueness of an object is based on foo and bar, not baz.

An example of desired output would then be:

[
    {
        "foo": 1,
        "bar": "a",
        "baz": "whatever",
        "count": 2
    },
    {
        "foo": 1,
        "bar": "b",
        "baz": "world",
        "count": 1
    }
]

or even:

[
    {
        "count": 2,
        "data": {
            "foo": 1,
            "bar": "a",
            "baz": "whatever"
        }
    },
    ...
]

I know how to do the uniqueness part (with unique_by([.foo, .bar])) but not the counting part.

GrecKo
  • 6,615
  • 19
  • 23

2 Answers2

4

You can use the following command based on group_by:

group_by(.foo,.bar)
| map(.[]+{"count":length})
| unique_by(.foo,.bar)

Output:

[
  {
    "foo": 1,
    "bar": "a",
    "baz": "whatever",
    "count": 2
  },
  {
    "foo": 1,
    "bar": "b",
    "baz": "world",
    "count": 1
  }
]

The other output you mentioned can be achieved with this command:

group_by(.foo,.bar)
| map({"count":length,"data":(unique_by(.foo,.bar)[])})

Output:

[
  {
    "count": 2,
    "data": {
      "foo": 1,
      "bar": "a",
      "baz": "whatever"
    }
  },
  {
    "count": 1,
    "data": {
      "foo": 1,
      "bar": "b",
      "baz": "world"
    }
  }
]
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • Thanks, It works! For the second output, wouldn't it be simpler to do `"data": first` ? – GrecKo Oct 19 '17 at 12:41
  • Yeah, can be used too. And would be simpler! – hek2mgl Oct 19 '17 at 13:56
  • Oh yes! Removed. – hek2mgl Oct 19 '17 at 20:37
  • 1
    There's no need for unique_by in either answer, e.g. for the first approach, the following is enough: group_by(.foo,.bar) | map(.[0]+{count:length}). For the second: group_by(.foo,.bar) | map({"count":length,"data": .[0]}) – peak Oct 20 '17 at 01:56
2

Here is a solution which uses peak's GROUPS_BY instead of group_by/1 to avoid sorting:

def GROUPS_BY(stream; f): reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;

  GROUPS_BY(.[]; {foo,bar}|tostring)
| .[0].count = length
| .[0]
jq170727
  • 13,159
  • 3
  • 46
  • 56