Flatten a nested json document using spark and load into Elasticsearch

Question

I am relatively new to Spark and java programming. Given a json file with nested objects I need to flatten its structure(denorm the contents) and load into Elastisearch using spark.

For instance,

if the contents of my example.json is:

{
  "title": "Nest eggs",
  "body":  "Making your money work...",
  "tags":  [ "cash", "shares" ],
  "comments": 
    {
      "name":    "John Smith",
      "comment": "Great article",
      "age":     28,
      "stars":   4,
      "date":    "2014-09-01"
    }
  "owner": 
    {
      "name":    "John Smith",
      "age":     28,
    }
}

I would want to reconstruct this in the below format and load it into ES using spark.

{
  "title": "Nest eggs",
  "body":  "Making your money work...",
  "tags":  [ "cash", "shares" ],
  "comments_name": "John Smith",
  "comments_comment": "Great article",
  "comments_age":     28
  "comments_stars":   4,
  "comments_date":    "2014-09-01"
  "owner_name": "John Smith",
  "owner_age":     28,
 }

In case one of the nested objects is empty, the contents can be left empty too.

Any help is appreciated. Thanks

Semantically, there's no difference between `"comments.name"` (a field `name` nested instead a `comments` field) and `"comments_name"` (a top-level field) — Val, Apr 28 '17 at 14:10
Agreed. It can be in either form. I want it to be a flattened json before loading it to the index in ES. — fledgling, Apr 28 '17 at 14:19

score -1 · Answer 1 · edited May 23 '17 at 12:02

-1

The answer you are looking for is here.

To summarize, you can just select out the fields you need via dot notation.

val df = sqlcontext.read.json(json)    
val flattened = df.select($"title", $"comments.name")

edited May 23 '17 at 12:02

Community

1
1

answered Apr 28 '17 at 19:39

Alex Naspo

2,052
1
20
37

I think he asked for an automatic solution – Thomas Decaux Nov 27 '18 at 17:25

Flatten a nested json document using spark and load into Elasticsearch

1 Answers1