0

File 1 has the data:

Name ID 
-------
Mark 1
Gary 2
Robert 3

File 2 has the data:

ID  result
----------
1   success
2   Fail 
3   success

I loaded the data into two variables a & b now I want to join the data in based on ID for which result is success. I am able to join but I am getting the data in an improper format.

a = load '/file1' as (Name:chararray,ID:int);
b = load '/file2' as (ID:int,result:chararray);
c = join a by a2, b by b1;

When I dump c I am getting the output in the format of (name,id,id,result)... How I need join a & b such that I can get the output in the format of (name,id,result)

RiaD
  • 46,822
  • 11
  • 79
  • 123
Venkat
  • 73
  • 3
  • 12

2 Answers2

2

You can't. What you have to do is project the fields that you want to keep using a FOREACH. You can do something like this:

D = FOREACH C GENERATE a::Name as Name, a::ID as ID, b::result as result ;
mr2ert
  • 5,146
  • 1
  • 21
  • 32
  • Yea. I did in that way. but I want to join the two files and get the id's which are success. We can't do join like that (name, id, result) ? – Venkat Aug 06 '13 at 18:58
  • Well, you are getting the ids which are successes. The issue is that they have two `ID` fields, when it makes more sense that the two `ID`s be merged into one. The reason why it keeps both the `ID`s after the join is so that the output of the `inner join` (which is what you are doing), follows the same output as an `outer join`. See this [chart](http://stackoverflow.com/a/16598900/2336149). – mr2ert Aug 06 '13 at 19:11
  • Ok. For this example ...Even if we do inner or outer join the output will be the same. – Venkat Aug 06 '13 at 19:32
  • For the sample yes. Since there are no `ID`s that are unique to either File 1 or File 2. – mr2ert Aug 06 '13 at 19:39
2

You can filter b before joining.

a = load '/file1' as (Name:chararray,ID:int);
b = load '/file2' as (ID:int,result:chararray);
z = FILTER b BY b2 == 'success';

Then join a and z. c = join a by a2, z by b1;

Later you need to do something as mentioned by @m2ert in previous answer.