5

I have currently started to work with JSON files and process data using PIG scripts. I am using Pig version 0.9.3.I have come across PiggyBank which i thought will be useful to load and process json file in PIG scripts.

I have built piggybank.jar through ANT. Later, I have compiled the Java File and updated the piggybank.jar. Was trying to run the given example json file.

I have written a simple PIGSCRIPT and the respective JSON as follows.

REGISTER piggybank.jar
a = LOAD 'file3.json' using org.apache.pig.piggybank.storage.JsonLoader() AS (json:map[]);
b = foreach a GENERATE flatten(json#'menu') AS menu;
c = foreach b generate flatten(menu#'popup') as popup;
d = foreach c generate flatten(popup#'menuitem') as menu;
e = foreach d generate flatten(menu#'value') as val;
DUMP e;

file3.json
{ "menu" : {
    "id" : "file",
    "value" : "File",
    "popup": {
      "menuitem" : [
        {"value" : "New", "onclick": "CreateNewDoc()"},
        {"value" : "Open", "onclick": "OpenDoc()"},
        {"value" : "Close", "onclick": "CloseDoc()"}
      ]
    }
 }}

I get the following exception during runtime:

org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error while reading input - Could not json-decode string: { "menu" : {
    at org.apache.pig.piggybank.storage.JsonLoader.parseStringToTuple(JsonLoader.java:127)

Pig log file:

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias e

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias e
        at org.apache.pig.PigServer.openIterator(PigServer.java:901)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:561)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:893)
        ... 12 more
================================================================================   

Please correct me if I am wrong. Thanks

Logan
  • 1,331
  • 3
  • 18
  • 41
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 14:48

1 Answers1

3

You can handle nested json loading with Twitter's Elephant Bird: https://github.com/kevinweil/elephant-bird

a = LOAD 'file3.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')

This will parse the JSON into a map http://pig.apache.org/docs/r0.11.1/basic.html#map-schema the JSONArray gets parsed into a DataBag of maps.

dranxo
  • 3,348
  • 4
  • 35
  • 48