4

I have the following json :

{
    "dataset_1": {
        "size_in_mb": 0.5,
        "task": "clean",
        "tags": ["apple", "banana", "strawberry"]
    },
    "dataset_2": {
        "size_in_mb": 100,
        "task": "split",
        "tags": ["apple"]
    },
    "dataset_3": {
        "size_in_mb": 1024,
        "task": "clean",
        "tags": ["strawberry"]
    }
}

How do I :

  1. get datasets which have a tag called "apple"
  2. get datasets which are larger than 500mb
  3. get datasets which have task as "split"

I am able to query the properties of a dataset, but not able to extract the name of the dataset with a certain property. e.g I can get ["strawberry"], but not ["dataset_1", "dataset_3"] when "tags" contains "strawberry".

This question comes close, but basically says you can't use jmespath.

dreftymac
  • 31,404
  • 26
  • 119
  • 182
dparkar
  • 1,934
  • 2
  • 22
  • 52
  • 1
    jfyi ... i ended up changing the schema a little .. moved to array format instead of object format .. added "name" as another element next to "task" – dparkar Mar 15 '19 at 18:27

1 Answers1

1

You figured this one out

  • As you stated in a comment, re-normalizing the original dataset to use sequentially-enumerated collation (instead of object-keys for top-level collation) is usually the best way to go, if you want to do general-purpose queries with jmespath.

  • The Stackoverflow post that you linked to goes into a little more detail on that matter here

Before and After re-normalizing the dataset

  • for the benefit of those who may want more detail on what you meant when you said i ended up changing the schema a little ... here is a "before and after" example of what that can look like

Before

  {
      "dataset_1": {
          "size_in_mb": 0.5,
          "task": "clean",
          "tags": ["apple", "banana", "strawberry"]
      },
      "dataset_2": {
          "size_in_mb": 100,
          "task": "split",
          "tags": ["apple"]
      },
      "dataset_3": {
          "size_in_mb": 1024,
          "task": "clean",
          "tags": ["strawberry"]
      }
  }

After

  {"dataroot":[
      {
          "name":      "dataset_1",
          "size_in_mb": 0.5,
          "task": "clean",
          "tags": ["apple", "banana", "strawberry"]
      },
      {
          "name":      "dataset_2",
          "size_in_mb": 100,
          "task": "split",
          "tags": ["apple", "banana", "strawberry"]
      },
      {
          "name":      "dataset_3",
          "size_in_mb": 1024,
          "task": "clean",
          "tags": ["strawberry"]
      }
  ]}
dreftymac
  • 31,404
  • 26
  • 119
  • 182