I have a usecase
map1.csv
col0|col1
10|a1
20|b1
map2.csv
col1|col2|col3|col4
a1|aa|ab|ac
a1|ba|bb|bc
a1|ca|cb|cc
b1|mm|mn|mo
b1|xy|yz|xz
I need to join map1.csv with map2.csv based on col1. If the col1 matches say a1, I need to take values of col2,col3 & col4 and store it in map as a list.
And hardcode the key as col2,col3,col4.
Expected Output:
10|a1|[{"col2": "aa","col3": "ab","col4": "ac"},{"col2": "ba","col3": "bb","col4": "bc"},{"col2": "ca","col3": "cb","col4": "cc"}]
20|b1|[{"col2": "mm","col3": "mn","col4": "mo"},{"col2": "xy","col3": "yz","col4": "xz"}]
The script is below:
input1= load 'map1.csv' using PigStorage('|') as (col0: int, col1: chararray);
input2= load 'map2.csv' using PigStorage('|') as (col1: chararray, col2: chararray,col3: chararray, col4: chararray);
input3 = GROUP input2 by col1;
input4 = JOIN input1 by col1, input3 by col1;
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 4, column 40> Invalid field projection. Projected field [col1] does not exist in schema: group:chararray,input2:bag{:tuple(col1:chararray,col2:chararray,col3:chararray,col4:chararray)}.