0

I have a usecase

map1.csv

col0|col1
10|a1
20|b1

map2.csv

col1|col2|col3|col4
a1|aa|ab|ac
a1|ba|bb|bc
a1|ca|cb|cc
b1|mm|mn|mo
b1|xy|yz|xz

I need to join map1.csv with map2.csv based on col1. If the col1 matches say a1, I need to take values of col2,col3 & col4 and store it in map as a list.

And hardcode the key as col2,col3,col4.

Expected Output:

10|a1|[{"col2": "aa","col3": "ab","col4": "ac"},{"col2": "ba","col3": "bb","col4": "bc"},{"col2": "ca","col3": "cb","col4": "cc"}]
20|b1|[{"col2": "mm","col3": "mn","col4": "mo"},{"col2": "xy","col3": "yz","col4": "xz"}]

The script is below:

    input1= load 'map1.csv' using PigStorage('|') as (col0: int, col1: chararray);
    input2= load 'map2.csv' using PigStorage('|') as (col1: chararray, col2: chararray,col3: chararray, col4: chararray);
    input3 = GROUP input2 by col1;
    input4 = JOIN input1 by col1, input3 by col1;
    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: 
<line 4, column 40> Invalid field projection. Projected field [col1] does not exist in schema: group:chararray,input2:bag{:tuple(col1:chararray,col2:chararray,col3:chararray,col4:chararray)}.
a123
  • 71
  • 1
  • 10
  • Did you run a describe on input4? `Invalid scalar projection: input3` might be more obvious when you do – OneCricketeer May 29 '17 at 15:09
  • grunt> describe input3; input3: {group: chararray,input2: {(col1: chararray,col2: chararray,col3: chararray,col4: chararray)}} grunt> describe input4; input4: {input1::age: int,input1::eid: chararray,input1::grade: chararray,input2::distinctGrade: chararray} – a123 May 29 '17 at 15:48
  • Okay, now look at it... There's no `input3.col2` to generate from input4, so it's not a valid projection – OneCricketeer May 29 '17 at 16:41
  • grunt> input4 = JOIN input1 by col1, input3 by col1; 2017-05-29 15:41:21,706 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [col1] does not exist in schema: group:chararray,input2:bag{:tuple(col1:chararray,col2:chararray,col3:chararray,col4:chararray)}.. There was incorect thing I passed on.. But still I am getting above error.. Can you let me know, how to proceed, to achieve the output? – a123 May 29 '17 at 22:45
  • Now it's `input3` that has no top-level attribute named `col1`. It's instead within a tuple of a bag. Refer https://stackoverflow.com/questions/8051180/pig-how-to-reference-columns-in-a-foreach-after-a-join – OneCricketeer May 30 '17 at 01:11

0 Answers0