21
A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate id,a1,b1;
dump D;

4th line fails on: Invalid field projection. Projected field [id] does not exist in schema

I tried to change to A.id but then the last line fails on: ERROR 0: Scalar has more than one row in the output.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
ihadanny
  • 4,377
  • 7
  • 45
  • 76

1 Answers1

49

What you are looking for is the "Disambiguate Operator". What you want is A::id, not A.id.

A.id says "there is a relation/bag A and there is a column called id in its schema"

A::id says "there is a record from A and that has a column called id"

So, you would do:

A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate A::id,a1,b1;
dump D;

A dirty alternative:

Just because I'm lazy, and disambiguation gets really weird when you start doing multiple joins one after another: use unique identifiers.

A = load 'a.txt' as (ida, a1);
B = load 'b.txt as (idb, b1);
C = join A by ida, B by idb;
D = foreach C generate ida,a1,b1;
dump D;
Donald Miner
  • 38,889
  • 8
  • 95
  • 118
  • Thanks for the explanation. Adding a link to this [related question](http://stackoverflow.com/questions/6877288/pig-error-1066-after-iterating-through-a-joined-set). – shoover Dec 18 '13 at 18:13
  • If I have any doubts what is the name of a field, I just use DESCRIBE. In your case DESCRIBE C; gives us C: {A::id: bytearray,A::a1: bytearray,B::id: bytearray,B::b1: bytearray}. Now you can easily find proper field name. – psmith Feb 26 '14 at 07:15
  • Is there any way to access the names of the fields taken from 'A' within the join result dynamically via some sort of introspection? For example, what if I don't know the schema of 'A' until runtime and then I want the resulting projection on line 4 to only include the columns from 'A' and not 'B'? Thanks. – nweiler Aug 14 '15 at 19:47