2

I am trying different ways to query a record within a array of records and display complete Row as output.

I dont know which nested Object has String "pg". But i want to query on particular object. Whether the object has "pg" or not. If "pg" exist then i want to display that complete row. How to write "spark sql query" on nested objects without specfying the object index.So i dont want to use the index of children.name

My Avro Record:

{
"name": "Parent",
"type":"record",
"fields":[
    {"name": "firstname", "type": "string"},

    {
        "name":"children",
        "type":{
            "type": "array",
            "items":{
                        "name":"child",
                        "type":"record",
                        "fields":[
                            {"name":"name", "type":"string"}
                        ]
                    }
            }
    }
]
}

I am using Spark SQL context to query dataframe which is read. So if input is

Row no   Firstname Children.name
    1    John       Max
                    Pg
    2    Bru        huna
                    aman

Output should return poq 1 since it has row where one object of children.name is pg.

val results = sqlc.sql("SELECT firstname, children.name FROM nestedread where children.name = 'pg'")
results.foreach(x=> println(x(0), x(1).toString))

The above query doesn't work. but it works when i query children[1].name.

I ALSO want to know that if i can filter a set of records and then explode. Instead of first explode and create large number of rows and then filter.

ZygD
  • 22,092
  • 39
  • 79
  • 102
pg20
  • 339
  • 6
  • 14
  • children is an Array, children.name does not mean anything. children[1].name means you access to the first object in that array, and you read name attribute. – RoyaumeIX May 17 '16 at 08:47
  • Yes. But i want to query on particular object. Whether the object has pg or not. If pg exist then i want to display that complete row. How to write "sql query" on nested objects without specfying the object index. – pg20 May 17 '16 at 09:56
  • Have you checked this one : http://bender.io/2013/09/22/returning-hierarchical-data-in-a-single-sql-query/ ? – RoyaumeIX May 17 '16 at 09:58
  • Also : http://stackoverflow.com/questions/29108179/jsonb-query-with-nested-objects-in-an-array – RoyaumeIX May 17 '16 at 09:59
  • If the answer provided solves your question please accept it to close the question ! – eliasah May 23 '16 at 05:11
  • Hi @eliasah I have answered my q. But where to close my question? – pg20 May 23 '16 at 05:24
  • There is an accept button next to the question under the votes – eliasah May 23 '16 at 05:26

2 Answers2

5

It seems that you can use

org.apache.spark.sql.functions.explode(e: Column): Column

for example in my project(in java), i have nested json like this:

{
    "error": [],
    "trajet": [
        {
            "something": "value"
        }
    ],
    "infos": [
        {
            "something": "value"
        }
    ],
    "timeseries": [
        {
            "something_0": "value_0",
            "something_1": "value_1",
            ...
            "something_n": "value_n"
        }
    ]
}

and i wanted to analyse datas in "timeseries", so i did:

DataFrame ts = jsonDF.select(org.apache.spark.sql.functions.explode(jsonDF.col("timeseries")).as("t"))
                     .select("t.something_0",
                             "t.something_1",
                             ...
                             "t.something_n");

I'm new to spark too. Hope this could give you a hint.

Yuan JI
  • 2,927
  • 2
  • 20
  • 29
  • Hi, thanxs. I found a way through Explode. val results = sqlc.sql("SELECT firstname, child.name, FROM parent LATERAL VIEW explode(children) childTable AS child ") – pg20 May 20 '16 at 12:49
  • I ALSO want to know that if i can filter a set of records and then explode. Instead of first explode and create large number of rows and then filter. – pg20 May 20 '16 at 12:55
1

The problem was solved by

I found a way through Explode.

val results = sqlc.sql("SELECT firstname, child.name, FROM parent LATERAL VIEW explode(children) childTable AS child 
pg20
  • 339
  • 6
  • 14