0

I am not able to read a JSON file using ElephantBird and Pig. I want to know where I am making a mistake.

Data:

{ "nrcpts": "1",
  "src": "info@example.com",
  "sendmailid": "p6D0r0u1006229",
  "relay": "app03.example.com",
  "classnumber": "0",
  "msgid": "WARQZCXAEMSSVWPPOOYZXR
LQIKMFUY.155763@example.com",
  "pid": "6229",
  "month": "Jul",
  "time": "20:53:00",
  "day": "12",
  "mailserver": "mail5",
  "size": "57395"
}

Code:

json1 = load '/user/hdetl/funnel/uetsample.dat' using com.twitter.elephantbird.pig.load.JsonLoader();

dat   = FOREACH json1 GENERATE $0#'mailserver' AS mailserver;
dump dat;

Error:

Input(s):
Failed to read data from "/user/hdetl/funnel/uetsample.dat"

detailed error :
Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: Error: in

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias dat. Backend error : Unable to recreate exception from back
ed error: Error: in
        at org.apache.pig.PigServer.openIterator(PigServer.java:891)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:495)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: in
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:344)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
        at org.apache.pig.PigServer.storeEx(PigServer.java:996)
        at org.apache.pig.PigServer.store(PigServer.java:963)
        at org.apache.pig.PigServer.openIterator(PigServer.java:876)
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
Bharathi
  • 1
  • 1
  • export JAVA_HOME=/usr/java/jdk1.6.0_22 export PIG_CLASSPATH=/etc/hadoop/conf export PATH=$PATH:/local/hdetl/pig-0.9.2/bin REGISTER /local/hdetl/funnel/pig-jars/json-simple-1.1.jar; REGISTER /local/hdetl/funnel/pig-jars/google-collect-1.0.jar; REGISTER '/local/hdetl/funnel/pig-jars/elephant-bird-1.2.1-SNAPSHOT.jar'; – Bharathi Feb 24 '12 at 22:25
  • Im not able to read a JSON file using elephantbird and PIG. I want to know where making the mistake.. – Bharathi Feb 24 '12 at 22:55
  • If you have additional info (like what the question actually is), please update the question, rather than only putting it in the comments. I will fix it now. -- For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 14:41

2 Answers2

0

Quite old post but someone may have similiar problem.

I've created input file from the data provided in the question.
I couldn't load thet file because of unnecessary Enter in the line:

"msgid": "WARQZCXAEMSSVWPPOOYZXR
LQIKMFUY.155763@example.com",

But fixing that didn't get expected result. I've removed all enters from the file, so finaly I have only one line.

File was loaded:

dump json1
([time#20:53:00,msgid#WARQZCXAEMSSVWPPOOYZXRLQIKMFUY.155763@example.com,relay#app03.example.com,mailserver#mail5,month#Jul,pid#6229,classnumber#0,day#12,src#info@example.com,sendmailid#p6D0r0u1006229,nrcpts#1,size#57395])

and you foreach works:

dat   = FOREACH json1 GENERATE $0#'mailserver' AS mailserver;
dump dat

(mail5)
psmith
  • 1,769
  • 5
  • 35
  • 60
0

I haven't used the JSON loader but I would imagine you should be able to drop the $0 in your foreach. I'm just going off the believe that the loader is just turning everything between { and } into a single record(Tuple).

dat   = FOREACH json1 GENERATE mailserver;
NerdyNick
  • 813
  • 1
  • 9
  • 17