0

I am running the following set of commands in Pig. My data set has one row for each student in a class and each student has a number of grades. Student name is tab separated from grades for that student. The scores for each student are comma separated. I need to find the average grade for each student. After grouping, I can successfully get the count of grades for each student but I cannot get the average score for each student. Pig complains it cannot find the iterator when it is averaging. I am confused since the iterator for both aggregate function COUNT and AVG is the same. I am not sure what I am missing. Any help is appreciated?

Scripts:

grunt>  A = LOAD 'grades.txt' USING PigStorage('\t') AS   
(f1:chararray,f2:chararray);
grunt> dump A;
(s14,59,94,81)
(s15,60,77)
(s16,77,77)
(s17,76,76)
(s18,19,61,72)
(s20,34,35)

grunt> B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as (grade:int);
grunt> describe B;
B: {stu: chararray,grade: int}
grunt> dump B;
(s14,59)
(s14,94)
(s14,81)
(s15,60)
(s15,77)
(s16,77)
(s16,77)
(s17,76)
(s17,76)
(s18,19)
(s18,61)
(s18,72)
(s20,34)
(s20,35)
grunt> grp = group B by stu;
grunt> cnt = foreach grp generate group, COUNT(B.grade);
grunt> dump cnt;
(s14,3)
(s15,2)
(s16,2)
(s17,2)
(s18,3)
(s20,2)
grunt> avg = foreach grp generate group, AVG(B.grade);
grunt> dump avg;
2015-03-20 21:56:30,900 ERROR org.apache.pig.tools.pigstats.PigStatsUtil: 
1 map  reduce job(s) failed!
2015-03-20 21:56:30,907 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 1066: 
Unable to open iterator for alias avg
Details at logfile: /home/training/pig/pig_1426902869706.log
grunt>
SKRahimi
  • 21
  • 1
  • 3
  • Used the following work around. It is definitely pointing to a bug in Pig. Changed "B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as (grade:int)" to "B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as grade" and then copied the bag into "C = foreach B generate stu as stu, grade as (int)grade; – SKRahimi Mar 23 '15 at 23:40
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 14:59

1 Answers1

0

As mentioned in the comments, a workaround was found:

changed

B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as (grade:int)

to

B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as grade

And then copied the bag into:

C = foreach B generate stu as stu, grade as (int)grade;
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122